notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

PCA.md (789B)


      1 # PCA (Principal Component Analysis)
      2 
      3 ML D5
      4 
      5 **Definition:** PCA is a dimensionality reduction algorithm that finds a hyperplane that lies close to the data and then projects the data onto it.
      6 
      7 The goal of this algorithm is to preserve maximum variance so values in the dataset are optimally spread out.
      8 
      9 The way to describe this as a cost function would be to minimize the mean squared distance between the original dataset and the projected position.
     10 
     11 When using PCA this compresses data and it is possible to get close to the original values. To do this using sklearn we can simply use the inverse transform. 
     12 
     13 There is also IPCA (incremental) which allows for out of core processing. Using this in concatenation with np.memmap which can load and unload np arrays from disk is useful.