Bandits.md (314B)
1 # Bandits 2 3 L1 4 5 **Definition:** Bandits are a class of problems in RL where an agent repeatedly chooses from a set of actions which give a reward drawn from an unknown probability distribution. 6 7 Basically, there are a set of actions, you do one, you have a reward... that's all 8 9 This is an MDP with only one state.