notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

Adam.md (412B)


      1 # Adam (Adaptive moment estimation)
      2 
      3 ML P587
      4 
      5 **Definition:** Adam combines momentum with RMSProp to calculate gradients based on momentum and historical gradients.
      6 
      7 This is the best in most cases.
      8 
      9 There are variants of adam as well such as AdaMax (generally worse), Nadam (uses [NAG](NAG.md) idea for calculating in direction of momentum and generally outperforms adam), AdamW (regularized with weight decay).