notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

RandomProjection.md (1267B)


      1 # Random Projection
      2 
      3 **Definition:** Random projection is an algorithm that selects dimensions at random to project onto. 
      4 
      5 Random projection is used because PCA can often be slow, and it has been shown that random projection does not loose too much data.
      6 
      7 This is for when you have things like 20,000 dimensions.
      8 
      9 There is also the johnson lindenstrauss min dim function from sklearn random projection that calculates based on the number of samples and some value reprensting the acceptable loss amount, the minimum number of dimensions to show all of the information with at least a certain level of accuracy.
     10 
     11 Example:
     12 ```python3
     13 from sklearn.random_projection import johnson_lindenstrauss_min_dim
     14 m, ε = 5_000, 0.1
     15 d = johnson_lindenstrauss_min_dim(m, eps=ε)
     16 d
     17 ```
     18 
     19 The output of this is 7300 so any higher dimensional values can be randomly projected to 7300 dimensional space without losing more than approximately 10% accuracy.
     20 
     21 Below is an example implementation of this random projection where we simply pass in the acceptable loss amount:
     22 
     23 ```python3
     24 
     25 sklearn.random_projection import GaussianRandomProjection
     26 gaussian_rnd_proj = GaussianRandomProjection(eps=ε, random_state=42)
     27 X_reduced = gaussian_rnd_proj.fit_transform(X) # same result as above
     28 
     29 ```