notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

Bandits.md (314B)


      1 # Bandits
      2 
      3 L1
      4 
      5 **Definition:** Bandits are a class of problems in RL where an agent repeatedly chooses from a set of actions which give a reward drawn from an unknown probability distribution.
      6 
      7 Basically, there are a set of actions, you do one, you have a reward... that's all
      8 
      9 This is an MDP with only one state.