DecisionTrees.md - notes - Personal notes

DecisionTrees.md (3028B)
      1 # Decision Trees
      2 
      3 ML D4
      4 
      5 **Definition:** Decision trees are a machine learning algorithm that does true/false comparison to go left and right until reaching a leaf node. This leaf node will then describe the output.
      6 
      7 ### Associated Links
      8 
      9 Classification and Regression Trees by Leo Breiman
     10 
     11 ### Visualizing
     12 
     13 You can use graphviz to visualize this graph. First, you train the model using sklearn.tree then you import export_graphviz from the same location. Using export_graphviz you can pass in the model, output file, feature names, class names , and some other information which will create a dotfile. 
     14 
     15 Then, you can import graphviz and user Source.from_file() to load in the dot file and view it.
     16 
     17 Ex:
     18 
     19 ```python3
     20 from sklearn.tree import export_graphviz
     21 from graphviz import Source
     22 
     23 graphData = export_graphviz(
     24     tree_clf,
     25     out_file='../graphs/iris_tree.dot',
     26     feature_names=["petal length (cm)", "petal width (cm)"],
     27     class_names=iris.target_names,
     28     rounded=True,
     29     filled=True
     30 )
     31 Source.from_file('../graphs/iris_tree.dot')
     32 ```
     33 
     34 ### Other Info
     35 
     36 There are root nodes and what are called 'split nodes' which is where the trees splits into two more nodes based on True/False comparisons. 
     37 
     38 An interesting thing about decision trees is that no feature scaling is required as features aren't compared to other features, unless you engineer another feature as some combination of them.
     39 
     40 In the context of decision trees, samples for a split node refers to the number of samples that made it to this point. This also applies for leaf nodes as well whereby it describes the number of samples made it to said leaf node.
     41 
     42 The 'gini' attribute measures the  impurity of a leaf node. A leaf node of 0 would mean all samples that made it to the node are a member of the target class whereas a value of .4 would mean 40% of the samples would be of another class.
     43 
     44 Scikit learn creates binary trees by using the CART algorithm but there are other decision tree implementations where it is not expressly yes/no such as ID3 where nodes can have more than two children.
     45 
     46 Decision trees can output probabilities based on the values that are used to generate the gini value. These are generally a list such as [50 , 2, 5] where 50 is the most probable and the others are lesser probabilities.
     47 
     48 The max_depth hyperparameter is the best way to regularize decision trees and reduce overfitting risks. There is also max features (comparisons per node), leaf nodes, min samples split, and min samples leaf which do similar restriction.
     49 
     50 ### Uhh Ohh
     51 
     52 These things really like orthogonals but not so much angles. If you have a dataset that is easily seperatble at an angle but not vertically or horizontally you will have a bad time with decision trees.
     53 
     54 One mediation for this is to use a PCA which rotates the data to reduce correlation between features.
     55 
     56 ### Hmmm....
     57 
     58 Scikit learn uses a stocastic sampling when training decision trees meaning they aren't consistent training to training. This is why random forests can be cool.
	notes Personal notes
	git clone git://git.laack.co/notes.git
	Log \| Files \| Refs