UnstableGradients.md - notes

UnstableGradients.md (1046B)

      1 # Unstable Gradients
      2 
      3 ML 550
      4 
      5 **Definition:** Unstable gradients are the idea that different layers of a neural network can learn at widely different rates.
      6 
      7 This often manifests as [ExplodingGradients](ExplodingGradients.md) or [[VanishingGradients.md]]
      8 
      9 This was a reason that deep neural networks were mostly abandoned in the early 2000s until there were revisions to model architecture. It was found that the initialization scheme of a normal weight distribution about 0 with a std deviation of 1 and the use of sigmoid activation functions caused this issue. Mainly the sigmoid function as they backpropogate gradients that are generally very small.
     10 
     11 To resolve this issue we need to ensure the variance of inputs and outputs are roughly equal. This can be done through a different initialization strategy called He initialization which uses ReLU.
     12 
     13 There is also another solution using LeCun initialization with a SeLU activation function.
     14 
     15 The final common approach, used with softmax activation, is to us the Glorot initialization method.

	notes Personal notes
	git clone git://git.laack.co/notes.git
	Log \| Files \| Refs