commit c88178742007f7829876591f920b52743ffeebb9
parent 0a9290b1deddbcebfbf5c00dc9795caccfb27fa7
Author: Andrew <andrewlaack1@gmail.com>
Date: Sun, 3 Nov 2024 13:39:28 -0600
Took some notes
Diffstat:
10 files changed, 41 insertions(+), 8 deletions(-)
diff --git a/Bandits.md b/Bandits.md
@@ -6,3 +6,7 @@ L1
## Notes
**Definition:** Bandits are a class of problems in RL where an agent repeatedly chooses from a set of actions which give a reward drawn from an unknown probability distribution.
+
+Basically, there are a set of actions, you do one, you have a reward... that's all
+
+This is an MDP with only one state.
diff --git a/DiscreteMath.md b/DiscreteMath.md
@@ -184,3 +184,11 @@ Unit 9.6 (Partial Orderings)
- [PartiallyOrderedSet](PartiallyOrderedSet.md)
- [HasseDiagram](HasseDiagram.md)
- [LexicographicOrdering](LexicographicOrdering.md)
+
+Unit 10.1 (Graphs)
+ - [Graphs](Graphs.md)
+ - SimpleGraph
+ - [Multigraph](Multigraph.md)
+ - [Loop](Loop.md)
+ - PseudoGraphs (multi edges + multi loop + undirect)
+ - MixedGraph (undirected + directed)
diff --git a/Graphs.md b/Graphs.md
@@ -1,4 +1,4 @@
-:data-structures: :math310:
+:data-structures: :math310: :discrete:
# Graphs
Abstract Math 10.2.
diff --git a/Loop.md b/Loop.md
@@ -1,4 +1,4 @@
-:data-structures: :cs:
+:data-structures: :cs: :discrete:
# Loop
Ch 4
diff --git a/MarkovDecisionProcesses.md b/MarkovDecisionProcesses.md
@@ -5,4 +5,6 @@ RL Ch 1
## Notes
-**Definition:** Markov decision processes are used to model decision making processes that are partly stochastic and partly controlled via decisions.
+**Definition:** Markov decision processes describe an environment for reinforcement learning.
+
+MDPs are like MRPs except they also have a finite set of actions (action space).
diff --git a/MarkovRewardProcess.md b/MarkovRewardProcess.md
@@ -5,4 +5,4 @@ L2
## Notes
-**Definition:** A markov reward process is a markov chain with values associated with states or transitions.
+**Definition:** A markov reward process is a markov chain with reward values associated with states or transitions.
diff --git a/Multigraph.md b/Multigraph.md
@@ -1,4 +1,4 @@
-:data-structures: :cs:
+:data-structures: :cs: :discrete:
# Multi-Graph
Ch 4
diff --git a/Policy.md b/Policy.md
@@ -7,4 +7,4 @@ RL Ch 1
**Definition:** A policy in machine learning is a function from the current state to the action an agent will take.
-Basically, this dictates what the agent will do in a given scenario.
+Basically, this dictates what the agent will do in a given scenario, this can also include some stochasticity (necessary for exploration).
diff --git a/ReinforcementLearning.md b/ReinforcementLearning.md
@@ -1,4 +1,4 @@
-:ml: :index:
+:ml: :index: :rl:
# Reinforcement Learning
Reinforcement Learning Index
@@ -19,7 +19,7 @@ DeepMind UCL Lectures
L1
* [CreditAssignmentProblem](CreditAssignmentProblem.md)
-* [ImitationLearning](ImitationLearning.md) (separate)
+* [ImitationLearning](ImitationLearning.md)
* [MarkovAssumption](MarkovAssumption.md)
* [PartiallyObservableMarkovDecisionProcess](PartiallyObservableMarkovDecisionProcess.md)
* [ModelFree](ModelFree.md)
@@ -27,5 +27,16 @@ L1
* [Evaluation](Evaluation.md)
L2
+* [MarkovDecisionProcesses](MarkovDecisionProcesses.md)
+* [MarkovAssumption](MarkovAssumption.md) - Also referred to as Markov Property
* [DiscountFactor](DiscountFactor.md)
* [MarkovRewardProcess](MarkovRewardProcess.md)
+* [MarkovProcess](MarkovProcess.md)
+* [Return](Return.md)
+* [Policy](Policy.md)
+* State-ValueFunction
+* Action-ValueFunction
+* BellmanEquation
+* ControlTheory (lookup)
+* StateTransitionMatrix
+* OptimalControl (lookup)
diff --git a/Return.md b/Return.md
@@ -0,0 +1,8 @@
+:rl: :ml:
+# Return
+
+L2
+
+## Notes
+
+**Definition:** Return is the sum of future rewards taking into account discount factor.