notes

Unnamed repository; edit this file 'description' to name the repository.
Log | Files | Refs

commit 7974bb6da9e46e6f4f5d2ec6a13142e4c06a1b53
parent 12bc1baf33b3b5dd40575abaa8f7b2e09d34f34e
Author: Andrew <andrewlaack1@gmail.com>
Date:   Wed, 13 Nov 2024 08:45:36 -0600

Took some RL notes

Diffstat:
AEpisode.md | 8++++++++
AEpisodic.md | 8++++++++
AIncrementalMean.md | 46++++++++++++++++++++++++++++++++++++++++++++++
AMonteCarloLearning.md | 12++++++++++++
MReinforcementLearning.md | 9+++++----
ATemporalDifferenceLearning.md | 8++++++++
6 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/Episode.md b/Episode.md @@ -0,0 +1,8 @@ +:rl: :ml: +# Episode + +L4 + +## Notes + +**Definition:** In episode in RL is a given evaluation of a policy from start to finish. diff --git a/Episodic.md b/Episodic.md @@ -0,0 +1,8 @@ +:ml: :rl: +# Episodic + +L4 + +## Notes + +**Definition:** Episodic, with resepect to RL, means that there are episodes as opposed to non-episodic which means something continues on forever. diff --git a/IncrementalMean.md b/IncrementalMean.md @@ -0,0 +1,46 @@ +:ml: :rl: +# Incremental Mean + +L4 + +## Notes + +**Definition:** Incremental mean is a mean calculation where we update the mean according to the next sample without having to calculate the mean by summing all priors. + +This is often used with Monte Carlo Learning where we calculate the empirical mean (perceived mean) not by summing all returns and dividing by iterations, but instead by updating it each time it is visited only based on the change the current finding will make. + +With incremental mean all we need to know is the prior mean, the current sample, and the total number of prior iterations. Obviously, with this information we could multiply the prior mean by the total number of prior iterations and then add the current and divide by total samples, but this is slow. Instead we calculate the incremental mean by adding 1/k * (return - prior mean) to the prior mean. + +Here is a simple python implementation: + + +```python + +import numpy as np +arr = np.random.rand(10) + +# compute mean normal way +def stdMean(priors): + mean = 0 + for i in priors: + mean += i + mean = mean/len(priors) + return mean + +# compute incremental mean +def incMeanCalc(priorMean, k, current): + return priorMean + (1/k * (current - priorMean)) + +incMean = 0 + +for k in range(0,len(arr)): + if -len(arr) + k + 1 == 0: + normMean = stdMean(arr) + else: + normMean = stdMean(arr[:-len(arr) + k + 1]) + + incMean = incMeanCalc(incMean, k + 1, arr[k]) + print(incMean) + print(normMean) + +``` diff --git a/MonteCarloLearning.md b/MonteCarloLearning.md @@ -0,0 +1,12 @@ +:rl: :ml: +# Monte Carlo Learning + +L4 + +## Notes + +**Definition:** Monte Carlo learning is a learning method that uses episodes and averages their returns to optimize policies. + +First Visit - First visit Monte Carlo learning we only increment the counter for the current state if it is the first visit to that state in the given episode. + +Every Visit - Every visit Monte Carlo learning increments the counter for the current state every time the state is visited. diff --git a/ReinforcementLearning.md b/ReinforcementLearning.md @@ -43,7 +43,8 @@ L3 L4 * [ModelFree](ModelFree.md) -* Episodic -* Episode -* MonteCarloLearning -* TemporalDifferenceLearning +* [Episodic](Episodic.md) +* [Episode](Episode.md) +* [MonteCarloLearning](MonteCarloLearning.md) +* [IncrementalMean](IncrementalMean.md) +* [TemporalDifferenceLearning](TemporalDifferenceLearning.md) diff --git a/TemporalDifferenceLearning.md b/TemporalDifferenceLearning.md @@ -0,0 +1,8 @@ +:ml: :rl: +# + + + +## Notes + +**Definition:**