notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

ReinforcementLearning.md (1822B)


      1 # Reinforcement Learning
      2 
      3 Reinforcement Learning: An Introduction (Sutton & Barto)
      4 
      5 Chapter 1 (Introduction)
      6 * [Markov Decision Processes](MarkovDecisionProcesses.md)
      7 * [Exploit](Exploit.md)
      8 * [Explore](Explore.md)
      9 * [Policy](Policy.md)
     10 * [Reward Signal](RewardSignal.md)
     11 * [Value Function](ValueFunction.md)
     12 * [Model](Model.md)
     13 * [Evolutionary Methods](EvolutionaryMethods.md)
     14 
     15 DeepMind UCL Lectures
     16 
     17 L1
     18 * [Credit Assignment Problem](CreditAssignmentProblem.md)
     19 * [Imitation Learning](ImitationLearning.md) 
     20 * [Markov Assumption](MarkovAssumption.md)
     21 * [Partially Observable Markov Decision Process](PartiallyObservableMarkovDecisionProcess.md)
     22 * [Model Free](ModelFree.md)
     23 * [Bandits](Bandits.md)
     24 * [Evaluation](Evaluation.md)
     25 
     26 L2
     27 * [Markov Decision Processes](MarkovDecisionProcesses.md)
     28 * [Markov Assumption](MarkovAssumption.md)
     29 * [Discount Factor](DiscountFactor.md)
     30 * [Markov Reward Process](MarkovRewardProcess.md)
     31 * [Markov Process](MarkovProcess.md)
     32 * [Return](Return.md)
     33 * [Policy](Policy.md)
     34 * [Bellman Equation](BellmanEquation.md)
     35 
     36 L3
     37 * [Dynamic Programming](DynamicProgramming.md)
     38 * [Optimal Substructure](OptimalSubstructure.md)
     39 * [Overlapping Subproblems](OverlappingSubproblems.md)
     40 
     41 L4
     42 * [Model Free](ModelFree.md)
     43 * [Episodic](Episodic.md)
     44 * [Episode](Episode.md)
     45 * [Monte Carlo Learning](MonteCarloLearning.md)
     46 * [Incremental Mean](IncrementalMean.md)
     47 * [Temporal Difference Learning](TemporalDifferenceLearning.md)
     48 * [Frequency Heuristic](FrequencyHeuristic.md)
     49 * [Recency Heuristic](RecencyHeuristic.md)
     50 * [Eligibility Traces](EligibilityTraces.md)
     51 
     52 L5
     53 * [Model Free](ModelFree.md)
     54 * [Monte Carlo Learning](MonteCarloLearning.md)
     55 * [Temporal Difference Learning](TemporalDifferenceLearning.md)
     56 * [On Policy Learning](OnPolicyLearning.md)
     57 * [Off Policy Learning](OffPolicyLearning.md)
     58 * EpsilonGreedy