Took some notes - notes - Unnamed repository; edit this file 'description' to name the repository.

commit c88178742007f7829876591f920b52743ffeebb9
parent 0a9290b1deddbcebfbf5c00dc9795caccfb27fa7
Author: Andrew <andrewlaack1@gmail.com>
Date:   Sun,  3 Nov 2024 13:39:28 -0600

Took some notes

Diffstat:
M Bandits.md  | 4 ++++
M DiscreteMath.md  | 8 ++++++++
M Graphs.md  | 2 +-
M Loop.md  | 2 +-
M MarkovDecisionProcesses.md  | 4 +++-
M MarkovRewardProcess.md  | 2 +-
M Multigraph.md  | 2 +-
M Policy.md  | 2 +-
M ReinforcementLearning.md  | 15 +++++++++++++--
A Return.md  | 8 ++++++++

10 files changed, 41 insertions(+), 8 deletions(-)
diff --git a/Bandits.md b/Bandits.md
@@ -6,3 +6,7 @@ L1
 ## Notes
 
 **Definition:** Bandits are a class of problems in RL where an agent repeatedly chooses from a set of actions which give a reward drawn from an unknown probability distribution.
+
+Basically, there are a set of actions, you do one, you have a reward... that's all
+
+This is an MDP with only one state.
diff --git a/DiscreteMath.md b/DiscreteMath.md
@@ -184,3 +184,11 @@ Unit 9.6 (Partial Orderings)
 	- [PartiallyOrderedSet](PartiallyOrderedSet.md)
 	- [HasseDiagram](HasseDiagram.md)
 	- [LexicographicOrdering](LexicographicOrdering.md)
+
+Unit 10.1 (Graphs)
+	- [Graphs](Graphs.md)
+	- SimpleGraph
+	- [Multigraph](Multigraph.md)
+	- [Loop](Loop.md)
+	- PseudoGraphs (multi edges + multi loop + undirect)
+	- MixedGraph (undirected + directed)
diff --git a/Graphs.md b/Graphs.md
@@ -1,4 +1,4 @@
-:data-structures: :math310:
+:data-structures: :math310: :discrete:
 # Graphs
 
 Abstract Math 10.2. 
diff --git a/Loop.md b/Loop.md
@@ -1,4 +1,4 @@
-:data-structures: :cs: 
+:data-structures: :cs: :discrete:
 # Loop
 
 Ch 4
diff --git a/MarkovDecisionProcesses.md b/MarkovDecisionProcesses.md
@@ -5,4 +5,6 @@ RL Ch 1
 
 ## Notes
 
-**Definition:** Markov decision processes are used to model decision making processes that are partly stochastic and partly controlled via decisions.
+**Definition:** Markov decision processes describe an environment for reinforcement learning.
+
+MDPs are like MRPs except they also have a finite set of actions (action space).
diff --git a/MarkovRewardProcess.md b/MarkovRewardProcess.md
@@ -5,4 +5,4 @@ L2
 
 ## Notes
 
-**Definition:** A markov reward process is a markov chain with values associated with states or transitions.
+**Definition:** A markov reward process is a markov chain with reward values associated with states or transitions.
diff --git a/Multigraph.md b/Multigraph.md
@@ -1,4 +1,4 @@
-:data-structures: :cs: 
+:data-structures: :cs: :discrete:
 # Multi-Graph
 
 Ch 4
diff --git a/Policy.md b/Policy.md
@@ -7,4 +7,4 @@ RL Ch 1
 
 **Definition:** A policy in machine learning is a function from the current state to the action an agent will take.
 
-Basically, this dictates what the agent will do in a given scenario.
+Basically, this dictates what the agent will do in a given scenario, this can also include some stochasticity (necessary for exploration).
diff --git a/ReinforcementLearning.md b/ReinforcementLearning.md
@@ -1,4 +1,4 @@
-:ml: :index:
+:ml: :index: :rl:
 # Reinforcement Learning
 
 Reinforcement Learning Index
@@ -19,7 +19,7 @@ DeepMind UCL Lectures
 
 L1
 * [CreditAssignmentProblem](CreditAssignmentProblem.md)
-* [ImitationLearning](ImitationLearning.md) (separate)
+* [ImitationLearning](ImitationLearning.md) 
 * [MarkovAssumption](MarkovAssumption.md)
 * [PartiallyObservableMarkovDecisionProcess](PartiallyObservableMarkovDecisionProcess.md)
 * [ModelFree](ModelFree.md)
@@ -27,5 +27,16 @@ L1
 * [Evaluation](Evaluation.md)
 
 L2
+* [MarkovDecisionProcesses](MarkovDecisionProcesses.md)
+* [MarkovAssumption](MarkovAssumption.md) - Also referred to as Markov Property
 * [DiscountFactor](DiscountFactor.md)
 * [MarkovRewardProcess](MarkovRewardProcess.md)
+* [MarkovProcess](MarkovProcess.md)
+* [Return](Return.md)
+* [Policy](Policy.md)
+* State-ValueFunction
+* Action-ValueFunction
+* BellmanEquation
+* ControlTheory (lookup)
+* StateTransitionMatrix
+* OptimalControl (lookup)
diff --git a/Return.md b/Return.md
@@ -0,0 +1,8 @@
+:rl: :ml:
+# Return
+
+L2
+
+## Notes
+
+**Definition:** Return is the sum of future rewards taking into account discount factor.

	notes Unnamed repository; edit this file 'description' to name the repository.
	Log \| Files \| Refs

M	Bandits.md	\|	4	++++
M	DiscreteMath.md	\|	8	++++++++
M	Graphs.md	\|	2	+-
M	Loop.md	\|	2	+-
M	MarkovDecisionProcesses.md	\|	4	+++-
M	MarkovRewardProcess.md	\|	2	+-
M	Multigraph.md	\|	2	+-
M	Policy.md	\|	2	+-
M	ReinforcementLearning.md	\|	15	+++++++++++++--
A	Return.md	\|	8	++++++++