notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

TargetEncoding.md (1208B)


      1 # Target Encoding
      2 
      3 ML CH2
      4 
      5 **Definition:** Target encoding is the process of mapping some feature to a representative value that is calculated. 
      6 
      7 This is different than [LabelEncoding](LabelEncoding.md) as label encoding uses an arbitrary mapping instead of a representative one. 
      8 
      9 A simple way to do this would be to find the mean target value of a given feature label (group by) and then mapping the feature to this mean. This is simple, but is imperfect especially when there is not a lot of information for a specific label.
     10 
     11 Another way to do this is by using a weighted mean that takes into account the means of all other feature options as well. This is often done by finding the current option's mean, multiplying it by the number of occurrences of said option, then adding the overall mean multiplied by some [Hyperparameter](Hyperparameter.md) m. The final thing to do is to divide this value by the number of instances of this option added to m.
     12 
     13 Equation:
     14 
     15 $\frac{n* \text{option mean} + m* \text{overall mean}}{n+m}$
     16 
     17 ## Issues
     18 
     19 The main issue with this approach is overfitting. When setting a parameter based on the target there is a higher likelihood that you will overfit the training data.