RandomProjection.md (1267B)
1 # Random Projection 2 3 **Definition:** Random projection is an algorithm that selects dimensions at random to project onto. 4 5 Random projection is used because PCA can often be slow, and it has been shown that random projection does not loose too much data. 6 7 This is for when you have things like 20,000 dimensions. 8 9 There is also the johnson lindenstrauss min dim function from sklearn random projection that calculates based on the number of samples and some value reprensting the acceptable loss amount, the minimum number of dimensions to show all of the information with at least a certain level of accuracy. 10 11 Example: 12 ```python3 13 from sklearn.random_projection import johnson_lindenstrauss_min_dim 14 m, ε = 5_000, 0.1 15 d = johnson_lindenstrauss_min_dim(m, eps=ε) 16 d 17 ``` 18 19 The output of this is 7300 so any higher dimensional values can be randomly projected to 7300 dimensional space without losing more than approximately 10% accuracy. 20 21 Below is an example implementation of this random projection where we simply pass in the acceptable loss amount: 22 23 ```python3 24 25 sklearn.random_projection import GaussianRandomProjection 26 gaussian_rnd_proj = GaussianRandomProjection(eps=ε, random_state=42) 27 X_reduced = gaussian_rnd_proj.fit_transform(X) # same result as above 28 29 ```