commit ec19b8d3519c49d3407bbd91ec5e84d2d22afe41
parent ce90e5ac9a6e8ec49569e31a83d4650491c164e7
Author: Andrew <andrewlaack1@gmail.com>
Date: Sun, 17 Nov 2024 12:20:49 -0600
Did stuff
Diffstat:
7 files changed, 69 insertions(+), 1 deletion(-)
diff --git a/EligibilityTraces.md b/EligibilityTraces.md
@@ -0,0 +1,10 @@
+:rl: :ml:
+# Eligibility Traces
+
+L4
+
+## Notes
+
+**Definition:** Eligibility traces combine both the frequency and recency heuristics to solve the credit assignment problem.
+
+Basically, every time we visit a state we increase the eligibility trace for the given state and over time this decays off. Higher values means the state is more associated with the outcome and lower means less. This allows us to both care about frequency because each visit adds to the trace, and care about recency because of decay.
diff --git a/FrequencyHeuristic.md b/FrequencyHeuristic.md
@@ -0,0 +1,12 @@
+:ml: :rl:
+# Frequency Heuristic
+
+L4
+
+## Notes
+
+**Definition:** The frequency heuristic is the idea that we assign credit based on how frequently things happen.
+
+In RL if we are to see 4 bells, 1 light, and get a negative reward, then by the frequency heuristic we could state the 4 bells caused the negative reward.
+
+This is a solution to the credit assigment problem.
diff --git a/MachineLearning.md b/MachineLearning.md
@@ -60,6 +60,16 @@ Ch 2 (Maths behind DL):
* [Bias](Bias.md)
* [Weight](Weight.md)
* Surface
+* GradientDescent
+* ForwardPass
+* BackwardPass
+* SGD
+* MiniBatchSGD
+* TrueSGD
+* BatchGradientDescent
+* Backpropagation
+* AutomaticDifferentiation
+*
ISL Python:
diff --git a/OffPolicyLearning.md b/OffPolicyLearning.md
@@ -0,0 +1,8 @@
+:rl: :ml:
+# Off Policy Learning
+
+L5
+
+## Notes
+
+**Definition:** Off policy learning can be thought of as looking over someone else's shoulder to understand what will and will not result in high rewards.
diff --git a/OnPolicyLearning.md b/OnPolicyLearning.md
@@ -0,0 +1,10 @@
+:rl: :ml:
+# On Policy Learning
+
+L5
+
+## Notes
+
+**Definition:** On policy learning is learning by following the policy.
+
+We sample actions from the policy whilst evaluating the policy.
diff --git a/RecencyHeuristic.md b/RecencyHeuristic.md
@@ -0,0 +1,10 @@
+:rl: :ml:
+# Recency Heuristic
+
+L4
+
+## Notes
+
+**Definition:** The recency heuristic is a solution to the credit assignment problem where we assign credit to reward/punishment to the most recent state(s).
+
+This is opposed to the [FrequencyHeuristic](FrequencyHeuristic.md) where we assign credit to the things that happened most often leading to the reward signal.
diff --git a/ReinforcementLearning.md b/ReinforcementLearning.md
@@ -46,6 +46,14 @@ L4
* [MonteCarloLearning](MonteCarloLearning.md)
* [IncrementalMean](IncrementalMean.md)
* [TemporalDifferenceLearning](TemporalDifferenceLearning.md)
+* [FrequencyHeuristic](FrequencyHeuristic.md)
+* [RecencyHeuristic](RecencyHeuristic.md)
+* [EligibilityTraces](EligibilityTraces.md)
L5
-
+* [ModelFree](ModelFree.md)
+* [MonteCarloLearning](MonteCarloLearning.md)
+* [TemporalDifferenceLearning](TemporalDifferenceLearning.md)
+* [OnPolicyLearning](OnPolicyLearning.md)
+* [OffPolicyLearning](OffPolicyLearning.md)
+* EpsilonGreedy