commit 0625a452e62ad787bc0c5374e35fd767334346c9
parent d27daa652ce245bac427d0d040769aac4e022414
Author: Andrew <andrewlaack1@gmail.com>
Date: Wed, 5 Jun 2024 13:58:33 -0500
Not many notes, but read stats book
Diffstat:
6 files changed, 38 insertions(+), 1 deletion(-)
diff --git a/DensityEstimation.md b/DensityEstimation.md
@@ -0,0 +1,10 @@
+:stats: :ml:
+# Density Estimation
+
+Stats D3
+
+## Notes
+
+**Definition:** Density estimation is the process of modeling the probability of given values for a dataset.
+
+This can be thought of similar to a histogram without the bins. A common form of this is a kde. The reason these can be better is that it does not have binning which can make data appear innacurately depending on the cut points and bin widths.
diff --git a/ExploratoryDataAnalysis.md b/ExploratoryDataAnalysis.md
@@ -0,0 +1,8 @@
+:stats:
+# Exploratory Data Analysis (EDA)
+
+Stats D3
+
+## Notes
+
+**Definition:** Exploratory data analysis is the process of exploring a dataset to find patterns and to create models/statistics/visualizations.
diff --git a/GradientDescent.md b/GradientDescent.md
@@ -17,4 +17,6 @@ For a simple implementation of gradient descent using a [[LearningRate.md]] for
When using gradient descent for linear regression one must calculate the partial derivative for each variable and then determine if it is positive or negative and move in the correct direction.
-Another thing, batch gradient descent is calculating the descents of all variables every time based on all samples given.
+Another thing, batch gradient descent is calculating the descents based on all of the samples given. An alternative to this is stochastic gradient descent which is much lighter and faster because it only tries to get closer at each step to a random point in the dataset. This allows for out-of-core learning where the entire dataset does not to be loaded in memory at any given time.
+
+Another type of gradient descent is mini-batch gradient descent which stands by batch and stochastic gradient descent. This form of GD uses smalll batches of random sets and then performs descent upon them. This is basically stochastic, but with a few more samples each time instead of a single random sample.
diff --git a/LearningRate.md b/LearningRate.md
@@ -13,3 +13,5 @@ See [[GradientDescentCode.md]] and [[GradientDescent.md]] for an example of when
Additionally, learning rate in a higher level sense, with regard to online learning, is how quickly a model will adapt to new data.
These constants that affect learning rate are called "hyperparameters" which are defined as constants prior to model training that are not built into the model.
+
+Another term is also the learning schedule. This is the rate at which the learning rate changes. In the case of [[GradientDescent.md]] this would be the amount it decreases over time as you narrow in on an optima.
diff --git a/Quantile.md b/Quantile.md
@@ -0,0 +1,12 @@
+:stats:
+# Quantile
+
+Stats D3
+
+## Notes
+
+**Definition:** Quantiles are logic divisions in a dataset to classify certain information.
+
+Examples are medians which split the data into two subsets, quartiles which split it into 4 quantiles, quintiles (5), deciles (10), and percentiles (100).
+
+Another thing to pay attention to is the interquartile/interquantile range which is the range from the top to the bottom of the quartile/quantile.
diff --git a/Statistics.md b/Statistics.md
@@ -23,4 +23,7 @@ Links to Stats Notes
[[MarginalProbabilities.md]]
[[Covariance.md]]
[[Correlation.md]]
+[[Quantile.md]]
+[[ExploratoryDataAnalysis.md]]
+[[DensityEstimation.md]]