notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

Imputation.md (748B)


      1 # Imputation
      2 
      3 CH2
      4 
      5 **Definition:** Imputation is the process of filling in null values with some appropriate value.
      6 
      7 This is often done with ml to set null values to 0, mean, median, or some other appropriate value.
      8 
      9 Using pandas, this can be done using df.fillna().
     10 
     11 There is also another way to do this using sklearn.impute's SimpleImputer. This can be used as follows:
     12 
     13 ```python
     14 from sklearn.impute import SimpleImputer
     15 
     16 imputer = SimpleImputer(strategy="median")
     17 imputer.fit(df) # Ensure the df only has np.number dtypes. 
     18 
     19 X = imputer.transform(df) # Set null values to medians (as specified) for the df. 
     20 ```
     21 
     22 The imputer above can also be used with most_frequent (mode), mean, or constant where you would then need to specify a fill_value.