notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

CART.md (1023B)


      1 # CART - Classification and Regression Tree Algorithm
      2 
      3 ML D4
      4 
      5 **Definition:** The CART algorithm is used to train decision trees and works by splitting a training set into two parts using a single feature k where k is the feature that produces the purest subsets weighted by size. This is then repeated at each step (greedy) until reaching either a max depth, or until reaching some depth whereby it can not find a split that will reduce impurity.
      6 
      7 Note that this algorithm is greedy so there may be better lines that could be drawn if it took a suboptimal line at a given point in time, but that would increase the computing cost drastically.
      8 
      9 There are two common cost functions that fall under CART being reducing entropy and gini impurity. Gini impurity is default (trying to minimize this) while entropy also known as information gain can be used, but it is slower as it uses logarithms.
     10 
     11 This can also be used with MSE instead of gini or entropy to do regression. We basically just want to minimize MSE at each step.