notes

Unnamed repository; edit this file 'description' to name the repository.
Log | Files | Refs

commit df6e3b8782c131ec64360c5a8cc3b23aefd894cf
parent cbb41d56fd32d89cc0f9e920150039251d5fb5ed
Author: Andrew <andrewlaack1@gmail.com>
Date:   Wed, 30 Oct 2024 20:39:07 -0500

Took some notes

Diffstat:
MDiscreteMath.md | 8+++++++-
AEquivalenceRelation.md | 8++++++++
MReinforcementLearning.md | 33+++++++++++++++++++++++----------
ARepresentative.md | 10++++++++++
Mindex.md | 1+
5 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/DiscreteMath.md b/DiscreteMath.md @@ -175,4 +175,10 @@ Unit 9.4 (Closures of Relations) - [Closure](Closure.md) Unit 9.5 (Equivalence Relations) - - EquivalenceRelation + - [EquivalenceRelation](EquivalenceRelation.md) + - [EquivalenceClass](EquivalenceClass.md) ([a] notation) + - [Representative](Representative.md) + - [Partition](Partition.md) + +Unit 9.6 (Partial Orderings) + - diff --git a/EquivalenceRelation.md b/EquivalenceRelation.md @@ -0,0 +1,8 @@ +:discrete: +# Equivalence Relation + +Ch 9.5 + +## Notes + +**Definition:** An equivalence relation is a relation that is reflexive (xRx), symmetric (xRy -> yRx), and transitive (xRy and yRz -> xRz). diff --git a/ReinforcementLearning.md b/ReinforcementLearning.md @@ -1,16 +1,29 @@ -:ml: +:ml: :index: # Reinforcement Learning -ML L1 +Reinforcement Learning Index -## Notes +Reinforcement Learning: An Introduction (Sutton & Barto) -**Definition:** Reinforce good behavior and punish bad behavior to get closer to goals. +Chapter 1 (Introduction) +* MarkovDecisionProcesses +* Exploit +* Explore +* Policy +* RewardSignal +* ValueFunction +* Model +* EvolutionaryMethods (learn not interacting) -If there is not an 'optimal' way to do something, you train the system to do stuff that works. Think like a dog, you let the dog do what it wants and then reinforce good behavior and punish bad behavior. +Stanford Lectures -This would probably be what you would want to create a chess model. - -In reinforcement learning the learning system is called an agent. This agent can observe the environment and perform actions to be rewarded. - -The strategy followed by the agent is called the 'policy'. +L1 +* CreditAssignmentProblem +* ImitationLearning (separate) +* MarkovAssumption +* MDP +* POMDP +* ModelFree +* Bandits +* Evaluation +* Control diff --git a/Representative.md b/Representative.md @@ -0,0 +1,10 @@ +:discrete: +# Representative + +Ch 9.5 + +## Notes + +**Definition:** A representative is any element of an equivalence class chosen to describe the class. + +Often we use the least positive residual for this (think in the case of mod equivalence classes). diff --git a/index.md b/index.md @@ -28,6 +28,7 @@ This is the index for my main note classifications. I will maintain this as a ho [[Physics.md]] [[Assembly.md]] [[Vocabulary.md]] +[[ReinforcementLearning.md]] ## Things to Learn More About