Imputation.md (748B)
1 # Imputation 2 3 CH2 4 5 **Definition:** Imputation is the process of filling in null values with some appropriate value. 6 7 This is often done with ml to set null values to 0, mean, median, or some other appropriate value. 8 9 Using pandas, this can be done using df.fillna(). 10 11 There is also another way to do this using sklearn.impute's SimpleImputer. This can be used as follows: 12 13 ```python 14 from sklearn.impute import SimpleImputer 15 16 imputer = SimpleImputer(strategy="median") 17 imputer.fit(df) # Ensure the df only has np.number dtypes. 18 19 X = imputer.transform(df) # Set null values to medians (as specified) for the df. 20 ``` 21 22 The imputer above can also be used with most_frequent (mode), mean, or constant where you would then need to specify a fill_value.