notes

Unnamed repository; edit this file 'description' to name the repository.
Log | Files | Refs

commit 704f29bb22a5ea3ebd5f11c55172f0178f241672
parent 5d3850ee3d06254acdae4ca2cbd1429f2618dc7b
Author: Andrew <andrewlaack1@gmail.com>
Date:   Sun, 10 Nov 2024 10:46:59 -0600

Took some notes on RL

Diffstat:
ABellmanEquation.md | 10++++++++++
ADynamicProgramming.md | 13+++++++++++++
AOptimalSubstructure.md | 8++++++++
AOverlappingSubproblems.md | 8++++++++
MReinforcementLearning.md | 12++++++------
5 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/BellmanEquation.md b/BellmanEquation.md @@ -0,0 +1,10 @@ +:rl: :ml: +# Bellman Equation + +L2 + +## Notes + +**Definition:** The Bellman equation is an equation that states the value of the optimal choice right now is the value of the next choice + the value of the current choice. + +This is intuitive and simple to understand, but it is the basis for our ability to do dynamic programming because without it there is no optimal substructure. diff --git a/DynamicProgramming.md b/DynamicProgramming.md @@ -0,0 +1,13 @@ +:rl: :ml: :algorithms: +# Dynamic Programming + +L3 + +## Notes + +**Definition:** Dynamic programming is the idea that we can break down a problem into subproblems, solve those subproblems, and then use the results to find the problem's overall solution. + +There are two necessary conditions for a problem to be solvable via DP: + +1. [OptimalSubstructure](OptimalSubstructure.md) +2. [OverlappingSubproblems](OverlappingSubproblems.md) diff --git a/OptimalSubstructure.md b/OptimalSubstructure.md @@ -0,0 +1,8 @@ +:rl: :ml: :algorithms: +# Optimal Substructure + +L3 + +## Notes + +**Definition:** Optimal substructure is a property of problems such that an overall (optimal) solution to the problem can be derived by finding out something about subproblems. diff --git a/OverlappingSubproblems.md b/OverlappingSubproblems.md @@ -0,0 +1,8 @@ +:ml: :rl: :algorithms: +# Overlapping Subproblems + +L3 + +## Notes + +**Definition:** Overlapping subproblems is a property of a problem such that subproblems occur again and again meaning we are being more efficient by solving these subproblems than by trying to solve the original problem. diff --git a/ReinforcementLearning.md b/ReinforcementLearning.md @@ -34,9 +34,9 @@ L2 * [MarkovProcess](MarkovProcess.md) * [Return](Return.md) * [Policy](Policy.md) -* State-ValueFunction -* Action-ValueFunction -* BellmanEquation -* ControlTheory (lookup) -* StateTransitionMatrix -* OptimalControl (lookup) +* [BellmanEquation](BellmanEquation.md) + +L3 +* [DynamicProgramming](DynamicProgramming.md) +* [OptimalSubstructure](OptimalSubstructure.md) +* [OverlappingSubproblems](OverlappingSubproblems.md)