README.md (4123B)
1 # CART-ELC 2 3 CART-ELC is an oblique decision tree induction algorithm. Our approach has a comparatively high computational cost, but the resulting models often have higher classification accuracies and better interpretability. Given the computational cost of our algorithm, it is best applied to small datasets ($n < 5000$). 4 5 Our paper is currently under review at TMLR, but is accessible on [arXiv](https://doi.org/10.48550/arXiv.2505.05402). 6 7 This repository contains the following: 8 9 ### Implementations 10 11 | Model | Language | Notes | 12 |-------------|-----------------|--------------------------------------------------------| 13 | CART-ELC | C++ / Python | Includes C++ tests | 14 | HHCART(A) | Python | See [HHCART](https://doi.org/10.48550/arXiv.1504.03415)| 15 | HHCART(D) | Python | See [HHCART](https://doi.org/10.48550/arXiv.1504.03415)| 16 17 ### Results 18 19 - Experiment scripts for HHCART(A), HHCART(D), and CART-ELC 20 - Experimental results for CART-ELC 21 22 ### Documentation 23 24 - Usage instructions / implementation details 25 - Source files for our paper 26 - Source files for our presentation 27 28 # Usage 29 30 ## Dependencies 31 32 - Python 3.8+ 33 - NumPy 34 - scikit-learn (for loading datasets) 35 - pandas 36 - A C++ compiler (e.g., 'g++') 37 - Make 38 - Graphviz (for rendering dot output) 39 - Setuptools 40 - pybind11 41 - python3-dev 42 43 If you encounter issues on Linux, please feel free to open an issue. 44 45 ## Run Experiments 46 47 **Note:** Running the full experiment suite may take a significant amount of time. Additionally, you will need to create a 'datasets' directory in the root of the repository and populate it with the 'housing.csv', 'diabetes.csv', 'cancer.csv', 'dim.csv', and 'bright.csv' files corresponding to the datasets described in our paper. 48 49 To run the CART-ELC experiments from our paper, from the root of the repository, run: 50 51 ``` 52 make experiments 53 ``` 54 55 This will compile the C++ code, move the compiled binary to the directory with the python code, and run all of our python experiments in series. 56 57 ## Fit and Predict 58 59 To use our implementation of CART-ELC you will first need to compile the C++ code for use with our python bindings. To do this, from the root of the repository, run 60 61 ``` 62 make so 63 ``` 64 65 This compiles the shared object and places it in the 'examples/cart-elc-experiments' directory. You can then cd to this directory and you should see a new file named 'decision_tree.so'. You should also see other .py files. Each of these files has been configured to read in a dataset, call the fit method, and call the predict method. These .py file should be sufficient to understand how CART-ELC can be used. Additionally, these files are the files we used for the experiments in our paper and can thus be ran individually to validate our findings. 66 67 ## Python Interfaces 68 69 Shown below are the interfaces for the 'ELCClassifier' object: 70 71 ```python 72 class ELCClassifier: 73 def __init__(self, depth: int, combinations: int, threadCount: int, objectiveFunction: Optional[str] = None) -> None: ... 74 def fit(self, X: np.ndarray, samples: int, y: np.ndarray, features: int) -> None: ... 75 def predict(self, X: np.ndarray, samples: int, features: int) -> np.ndarray: ... 76 def getDot(self) -> str: ... 77 def getSplits(self) -> int: ... 78 ``` 79 80 The primary distinction between sklearn's implementations of decision trees and ours is that we require the consumer to pass in the sizes for the arrays being passed to the classifier. This was done for ease of development as under the hood we are primarily using raw c-style arrays that don't have a size attribute. 81 82 We also note getDot differs slightly from sklearn's plot_tree implementation. When called, the object returns a dot language string. This string can then be rendered using Graphviz. 83 84 85 # Acknowledgments 86 87 This project uses the Eigen library, a third-party C++ library for linear algebra. We are not affiliated with or endorsed by the developers or maintainers of the Eigen library. The use of Eigen in this project is for research and educational purposes only.