Took some notes about ml stuf - notes - Unnamed repository; edit this file 'description' to name the repository.

commit 96f02ace3bd43a4b39b0a50b7a3f4f9aca469cde
parent ace9cfa24cf3a5265cb3139363bd61d0a1c27069
Author: Andrew <andrewlaack1@gmail.com>
Date:   Tue,  4 Jun 2024 18:39:39 -0500

Took some notes about ml stuf

Diffstat:
A Accuracy.md  | 10 ++++++++++
A ConditionalProbabilities.md  | 10 ++++++++++
A Correlation.md  | 10 ++++++++++
A Covariance.md  | 10 ++++++++++
M CrossValidation.md  | 2 +-
A DecisionThreshold.md  | 10 ++++++++++
A HarmonicMean.md  | 20 ++++++++++++++++++++
A JointProbability.md  | 10 ++++++++++
M MachineLearning.md  | 8 ++++++++
A MarginalProbabilities.md  | 10 ++++++++++
A MulticlassClassifier.md  | 8 ++++++++
M NormalDistribution.md  | 2 --
A OneVersusAll.md  | 12 ++++++++++++
A OneVersusOne.md  | 14 ++++++++++++++
A ROC.md  | 10 ++++++++++
A Recall.md  | 1 +
A StandardDeviation.md  | 11 +++++++++++
M Statistics.md  | 6 ++++++
A Variance.md  | 16 ++++++++++++++++

19 files changed, 177 insertions(+), 3 deletions(-)
diff --git a/Accuracy.md b/Accuracy.md
@@ -0,0 +1,10 @@
+:ml:
+# Accuracy
+
+ML D2
+
+## Notes
+
+**Definition:** Accuracy in machine learning describes the overall correctness of a model. 
+
+This metric is the percentage of guesses that are accurate based on predictions and labels.
diff --git a/ConditionalProbabilities.md b/ConditionalProbabilities.md
@@ -0,0 +1,10 @@
+:stats:
+# Conditional Probabilities
+
+Stats D2
+
+## Notes
+
+**Definition:** Conditional probabilities are probabilities of something with regard to some condition.
+
+An example of this is there is an 80% chance that a republican will be in favor of something. This is a conditional probability where the condition is being republican and the probability is 80%.
diff --git a/Correlation.md b/Correlation.md
@@ -0,0 +1,10 @@
+:stats:
+# Correlation
+
+Stats D2
+
+## Notes
+
+**Definition:** Correlation is the strength and direction relationship between two variables. This value is bounded between -1 and 1 where 0 is no correlation, 1 is pure positive linear relationship, and -1 is a pure negative linear relationship.
+
+See [[CorrelationCoefficient.md]] for an applied example.
diff --git a/Covariance.md b/Covariance.md
@@ -0,0 +1,10 @@
+:stats:
+# Covariance 
+
+Stats D2
+
+## Notes
+
+**Definition:** Covariance is the strength of a linear relationship between two different variables. When this number is larger it indicates that higher numbers for one of the variables is associated with higher numbers for the other. The inverse is also true. 
+
+There are also no bounds for the range of covariance unlike correlation.
diff --git a/CrossValidation.md b/CrossValidation.md
@@ -7,4 +7,4 @@ ML CH3
 
 **Definition:** Cross validation is the process of creating a subset of your data and then training the model on some subset of said data.
 
-A common form of this is k-fold cross-validation. This creates k-folds (subsets) and trains the model on each set. It then checks the accuracy of each model by using the other folds as validation sets. This helps to ward off overfitting.  
+A common form of this is k-fold cross-validation. This creates k-folds (subsets) and trains the model on each subset that is not selected. Then it validates the accuracy upon the one subset that was not selected to be used in training to use it as the validation set. 
diff --git a/DecisionThreshold.md b/DecisionThreshold.md
@@ -0,0 +1,10 @@
+:ml: :classification:
+# Decision Threshold
+
+ML CH3
+
+## Notes
+
+**Definition:** In classical classification, a decision threshold is the position on some line where greater values are classified in some way and lesser value another way. 
+
+When we have a higher threshold it increases precision because things that are less likely to be classified will be considered not part of the set, but in turn doing this also decreases the recall because it is more likely to give false negatives.
diff --git a/HarmonicMean.md b/HarmonicMean.md
@@ -0,0 +1,20 @@
+:ml:
+# Harmonic Mean
+
+ML D2
+
+## Notes
+
+**Definition:** The harmonic mean is a metric used to describe the accuracy of a model. This value is representative of the precision and recall of a model.
+
+Basically, this is a combination of [[Precision.md]] and [[Recall.md]]. 
+
+The harmonic mean favors models with similarly good values for both recall and precision which can be good in certain cases. There are however many cases where precision, recall, or accuracy may be more important.
+
+
+Formula:
+
+
+F_1 = 2 * (p * r) / (p+r)
+
+Where p = [[Precision.md]] and r = [[Recall.md]]
diff --git a/JointProbability.md b/JointProbability.md
@@ -0,0 +1,10 @@
+:stats:
+# Joint Probability
+
+Stats D2
+
+## Notes
+
+**Definition:** A joint probability is the probability of multiple conditions.
+
+An example of this is that 48% of voters are in favor of the bill and democrats. This is the joint probability of any given voter being both a democrat and in favor of the bill. 
diff --git a/MachineLearning.md b/MachineLearning.md
@@ -87,6 +87,14 @@ Concepts:
 [[CrossValidation.md]]
 [[Precision.md]]
 [[TruePositiveRate.md]]
+[[Recall.md]]
+[[HarmonicMean.md]]
+[[Accuracy.md]]
+[[DecisionThreshold.md]]
+[[ROC.md]]
+[[MulticlassClassifier.md]]
+[[OneVersusAll.md]]
+[[OneVersusOne.md]]
 
 To do:
 
diff --git a/MarginalProbabilities.md b/MarginalProbabilities.md
@@ -0,0 +1,10 @@
+:stats:
+# Marginal Probabilities
+
+Stats D2
+
+## Notes
+
+**Definition:** Marginal probabilities are probabilities that are not conditional upon any other probabilities.
+
+This is in contrast with [[JointProbability.md]] which are ands and [[ConditionalProbabilities.md]] which are ors.
diff --git a/MulticlassClassifier.md b/MulticlassClassifier.md
@@ -0,0 +1,8 @@
+:ml:
+# Multiclass Classifier
+
+ML D2
+
+## Notes
+
+**Definition:** A multiclass classifier is a classifier that classifies items into more than two classes (not binary classification).
diff --git a/NormalDistribution.md b/NormalDistribution.md
@@ -6,5 +6,3 @@ Stats D1
 ## Notes
 
 **Definition:** A normal distribution is a unimodal one in which most observations cluster around the mound while fewer and fewer observations are farther away. 
-
-
diff --git a/OneVersusAll.md b/OneVersusAll.md
@@ -0,0 +1,12 @@
+:ml:
+# One Versus All (OvA) or One Versus Rest (OvR)
+
+ML D2
+
+## Notes
+
+**Definition:** One versus all classifiers are a sequence of binary classifiers that output probabilities where the highest probability is then selected as the output. 
+
+Think of this as a series of SVC or SGD classifiers that output some likelihood that the current input is part of a particular class. You then send the input into each model and whichever one outputs the highest probability is the class that the input belongs to. 
+
+See also [[OneVersusOne.md]] for another strategy to put together models to do classification.
diff --git a/OneVersusOne.md b/OneVersusOne.md
@@ -0,0 +1,14 @@
+:ml:
+# One Versus One (OvO)
+
+ML D2
+
+## Notes
+
+**Definition:** A one versus one classification strategy trains binary classifiers to output the probability of an input being part of one class or another. 
+
+Basically, you train a model to compare between one set and another. It outputs the probability of one output over the other. Then by doing these comparisons whichever class wins with the most classifiers the input is part of that class(in theory).
+
+As such, one must train N * (N-1)/2 classifiers which can be a lot depending on how many classes there are. In the case of 0-9 (mnist) this comes out to 45 models. On the flip side, given how the model works, each model does not need to be trained on the entire set only the subset containing the classes being compared. 
+
+See also [[OneVersusAll.md]] for another strategy regarding classification based on binary classifier chaining. The main reason OvO can be better than OvA is because some models are slow to train on larger datasets thus only training models on a subset, albeit training more models, can be faster. This is especially true for support vector machine classification models. In most cases however OvA is preferred.
diff --git a/ROC.md b/ROC.md
@@ -0,0 +1,10 @@
+:ml:
+# Receiver Operating Characteristic (ROC)
+
+ML D3
+
+## Notes
+
+**Definition:** The ROC curve plots the rate of true positives for a dataset against the rate of false positives as the decision threshold changes.
+
+This type of graph is used to show threshold information for binary classification models.
diff --git a/Recall.md b/Recall.md
@@ -0,0 +1 @@
+See [[TruePositiveRate.md]]
diff --git a/StandardDeviation.md b/StandardDeviation.md
@@ -0,0 +1,11 @@
+:stats:
+# Standard Deviation
+
+Stats D2
+
+## Notes
+
+**Definition:** This is the average difference between each value in a dataset and the mean of the dataset. 
+
+
+See also [[Variance.md]] which is the squared value. 
diff --git a/Statistics.md b/Statistics.md
@@ -17,4 +17,10 @@ Links to Stats Notes
 [[PoissonDistribution.md]]
 [[ExponentialDistribution.md]]
 [[NormalDistribution.md]]
+[[Variance.md]]
+[[ConditionalProbabilities.md]]
+[[JointProbability.md]]
+[[MarginalProbabilities.md]]
+[[Covariance.md]]
+[[Correlation.md]]
 
diff --git a/Variance.md b/Variance.md
@@ -0,0 +1,16 @@
+:stats:
+# Variance
+
+Stats D2
+
+## Notes
+
+**Definition:** The variance of samples is the average squared difference between each value and the mean. 
+
+This can be shown as follows for X:
+
+Var(X) = |X|^-1 * sum((x - mean)^2)
+
+Shown above, find the difference between each value and the mean, square it to get a positive, and then sum the values. We then average it by multiplying by 1 over the cardinality of X.
+
+If we take the square root of the variance we then have the [[StandardDeviation.md]]

	notes Unnamed repository; edit this file 'description' to name the repository.
	Log \| Files \| Refs

A	Accuracy.md	\|	10	++++++++++
A	ConditionalProbabilities.md	\|	10	++++++++++
A	Correlation.md	\|	10	++++++++++
A	Covariance.md	\|	10	++++++++++
M	CrossValidation.md	\|	2	+-
A	DecisionThreshold.md	\|	10	++++++++++
A	HarmonicMean.md	\|	20	++++++++++++++++++++
A	JointProbability.md	\|	10	++++++++++
M	MachineLearning.md	\|	8	++++++++
A	MarginalProbabilities.md	\|	10	++++++++++
A	MulticlassClassifier.md	\|	8	++++++++
M	NormalDistribution.md	\|	2	--
A	OneVersusAll.md	\|	12	++++++++++++
A	OneVersusOne.md	\|	14	++++++++++++++
A	ROC.md	\|	10	++++++++++
A	Recall.md	\|	1	+
A	StandardDeviation.md	\|	11	+++++++++++
M	Statistics.md	\|	6	++++++
A	Variance.md	\|	16	++++++++++++++++