Adam.md (412B)
1 # Adam (Adaptive moment estimation) 2 3 ML P587 4 5 **Definition:** Adam combines momentum with RMSProp to calculate gradients based on momentum and historical gradients. 6 7 This is the best in most cases. 8 9 There are variants of adam as well such as AdaMax (generally worse), Nadam (uses [NAG](NAG.md) idea for calculating in direction of momentum and generally outperforms adam), AdamW (regularized with weight decay).