notes

Unnamed repository; edit this file 'description' to name the repository.
Log | Files | Refs

commit c88178742007f7829876591f920b52743ffeebb9
parent 0a9290b1deddbcebfbf5c00dc9795caccfb27fa7
Author: Andrew <andrewlaack1@gmail.com>
Date:   Sun,  3 Nov 2024 13:39:28 -0600

Took some notes

Diffstat:
MBandits.md | 4++++
MDiscreteMath.md | 8++++++++
MGraphs.md | 2+-
MLoop.md | 2+-
MMarkovDecisionProcesses.md | 4+++-
MMarkovRewardProcess.md | 2+-
MMultigraph.md | 2+-
MPolicy.md | 2+-
MReinforcementLearning.md | 15+++++++++++++--
AReturn.md | 8++++++++
10 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/Bandits.md b/Bandits.md @@ -6,3 +6,7 @@ L1 ## Notes **Definition:** Bandits are a class of problems in RL where an agent repeatedly chooses from a set of actions which give a reward drawn from an unknown probability distribution. + +Basically, there are a set of actions, you do one, you have a reward... that's all + +This is an MDP with only one state. diff --git a/DiscreteMath.md b/DiscreteMath.md @@ -184,3 +184,11 @@ Unit 9.6 (Partial Orderings) - [PartiallyOrderedSet](PartiallyOrderedSet.md) - [HasseDiagram](HasseDiagram.md) - [LexicographicOrdering](LexicographicOrdering.md) + +Unit 10.1 (Graphs) + - [Graphs](Graphs.md) + - SimpleGraph + - [Multigraph](Multigraph.md) + - [Loop](Loop.md) + - PseudoGraphs (multi edges + multi loop + undirect) + - MixedGraph (undirected + directed) diff --git a/Graphs.md b/Graphs.md @@ -1,4 +1,4 @@ -:data-structures: :math310: +:data-structures: :math310: :discrete: # Graphs Abstract Math 10.2. diff --git a/Loop.md b/Loop.md @@ -1,4 +1,4 @@ -:data-structures: :cs: +:data-structures: :cs: :discrete: # Loop Ch 4 diff --git a/MarkovDecisionProcesses.md b/MarkovDecisionProcesses.md @@ -5,4 +5,6 @@ RL Ch 1 ## Notes -**Definition:** Markov decision processes are used to model decision making processes that are partly stochastic and partly controlled via decisions. +**Definition:** Markov decision processes describe an environment for reinforcement learning. + +MDPs are like MRPs except they also have a finite set of actions (action space). diff --git a/MarkovRewardProcess.md b/MarkovRewardProcess.md @@ -5,4 +5,4 @@ L2 ## Notes -**Definition:** A markov reward process is a markov chain with values associated with states or transitions. +**Definition:** A markov reward process is a markov chain with reward values associated with states or transitions. diff --git a/Multigraph.md b/Multigraph.md @@ -1,4 +1,4 @@ -:data-structures: :cs: +:data-structures: :cs: :discrete: # Multi-Graph Ch 4 diff --git a/Policy.md b/Policy.md @@ -7,4 +7,4 @@ RL Ch 1 **Definition:** A policy in machine learning is a function from the current state to the action an agent will take. -Basically, this dictates what the agent will do in a given scenario. +Basically, this dictates what the agent will do in a given scenario, this can also include some stochasticity (necessary for exploration). diff --git a/ReinforcementLearning.md b/ReinforcementLearning.md @@ -1,4 +1,4 @@ -:ml: :index: +:ml: :index: :rl: # Reinforcement Learning Reinforcement Learning Index @@ -19,7 +19,7 @@ DeepMind UCL Lectures L1 * [CreditAssignmentProblem](CreditAssignmentProblem.md) -* [ImitationLearning](ImitationLearning.md) (separate) +* [ImitationLearning](ImitationLearning.md) * [MarkovAssumption](MarkovAssumption.md) * [PartiallyObservableMarkovDecisionProcess](PartiallyObservableMarkovDecisionProcess.md) * [ModelFree](ModelFree.md) @@ -27,5 +27,16 @@ L1 * [Evaluation](Evaluation.md) L2 +* [MarkovDecisionProcesses](MarkovDecisionProcesses.md) +* [MarkovAssumption](MarkovAssumption.md) - Also referred to as Markov Property * [DiscountFactor](DiscountFactor.md) * [MarkovRewardProcess](MarkovRewardProcess.md) +* [MarkovProcess](MarkovProcess.md) +* [Return](Return.md) +* [Policy](Policy.md) +* State-ValueFunction +* Action-ValueFunction +* BellmanEquation +* ControlTheory (lookup) +* StateTransitionMatrix +* OptimalControl (lookup) diff --git a/Return.md b/Return.md @@ -0,0 +1,8 @@ +:rl: :ml: +# Return + +L2 + +## Notes + +**Definition:** Return is the sum of future rewards taking into account discount factor.