notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

VanishingGradients.md (636B)


      1 # Vanishing Gradients
      2 
      3 ML 550
      4 
      5 **Definition:** Vanishing gradients is a neural network problem where lower levels (earlier hidden layers) have such small gradients that gradient steps make tiny changes and the model never converges upon an a good solution.
      6 
      7 This is a very common problem as most of the time gradients get smaller and smaller. As such, this problem is much more common than [ExplodingGradients](ExplodingGradients.md) which primarly happens with RNNs.
      8 
      9 ### Solutions
     10 
     11 Use ReLU and better weight initialization (not gaussian distribution with std deviation of 1).
     12 
     13 See [UnstableGradients](UnstableGradients.md) for more.