notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

OneHotEncoding.md (895B)


      1 # One-hot Encoding
      2 
      3 ML CH2
      4 
      5 **Definition:** One hot encoding is the process of taking all unique features of a given feature and expanding these out to be individual boolean attributes of a sample. 
      6 
      7 An example of this is if you have a column that states the distance from the ocean. The options are island, 1 hour, and near ocean. These could be encoded as integers, but the issue is that these value are not representative of what the values mean thus mapping this to a linear regression would cause issues because higher or lower does not necessarily mean better. As such, you would then add 1 hour, near ocean, and island as columns and then set booleans as true or false based on the distance string. 
      8 
      9 See [LabelEncoding](LabelEncoding.md) for a simple way of encoding strings as numbers. This is useful when there are lots of options and the model knows the data is arbitrarily numbered.