OneHotEncoding.md (895B)
1 # One-hot Encoding 2 3 ML CH2 4 5 **Definition:** One hot encoding is the process of taking all unique features of a given feature and expanding these out to be individual boolean attributes of a sample. 6 7 An example of this is if you have a column that states the distance from the ocean. The options are island, 1 hour, and near ocean. These could be encoded as integers, but the issue is that these value are not representative of what the values mean thus mapping this to a linear regression would cause issues because higher or lower does not necessarily mean better. As such, you would then add 1 hour, near ocean, and island as columns and then set booleans as true or false based on the distance string. 8 9 See [LabelEncoding](LabelEncoding.md) for a simple way of encoding strings as numbers. This is useful when there are lots of options and the model knows the data is arbitrarily numbered.