Completed this crap - notes - Unnamed repository; edit this file 'description' to name the repository.

commit c157c455eb08ab8a4a086f9201f75cbf802aaf97
parent 008d8ae437adc2962adae34e117491c060f9490e
Author: Andrew <andrewlaack1@gmail.com>
Date:   Mon, 10 Jun 2024 09:55:39 -0500

Completed this crap

Diffstat:
A AdaBoost.md  | 10 ++++++++++
A BayesianInference.md  | 10 ++++++++++
A Boosting.md  | 16 ++++++++++++++++
A ExtraTrees.md  | 10 ++++++++++
A GradientBoosting.md  | 11 +++++++++++
A HistogramBasedGradientBoosting.md  | 12 ++++++++++++
M MachineLearning.md  | 12 ++++++++++++
A ManifoldLearning.md  | 11 +++++++++++
M OutOfBag.md  | 2 +-
A PCA.md  | 12 ++++++++++++
A Projection.md  | 10 ++++++++++
M RandomForest.md  | 2 +-
A RandomPatches.md  | 12 ++++++++++++
A RandomSubspaces.md  | 8 ++++++++
A Stacking.md  | 14 ++++++++++++++
M Statistics.md  | 1 +
A Subspace.md  | 10 ++++++++++
M Variance.md  | 2 ++

18 files changed, 163 insertions(+), 2 deletions(-)
diff --git a/AdaBoost.md b/AdaBoost.md
@@ -0,0 +1,10 @@
+:ml:
+# AdaBoost (adaptive boosting)
+
+ML D5
+
+## Notes
+
+**Definition:** Adaboost is a boosting algorithm that boosts training instances that the prior model underfit (missed). 
+
+In adaboosting each predictor gets a model weight based on how accurate it is generally then each instance weight is also updated based on the accuracy of the models prediction. When models are wrong more often their weight is lowered but when instances are wrong their weight is increased to incentivize future models to fix the issue.
diff --git a/BayesianInference.md b/BayesianInference.md
@@ -0,0 +1,10 @@
+:stats:
+# Bayesian Inference
+
+Stats D5
+
+## Notes
+
+**Definition:** Bayesian inference is the principal that p(something) can often be described based on prior inferences that may make p(something) more or less likely thus factoring them into the probability.
+
+This is basically using state to update probability values.
diff --git a/Boosting.md b/Boosting.md
@@ -0,0 +1,16 @@
+:ml:
+# Boosting
+
+ML D5
+
+## Notes
+
+**Definition:** Boosting is the process of combining several weak learners into one strong learner.
+
+The idea of this is to sequentially train predictors to correct the output of prior models.
+
+Adaboost is a popular boosting algorithm which is short for adaptive boosting.
+
+There is also gradientboosting which is popular as well.
+
+The main difference between boosting and most voting classification implementations is that it is purely sequential. It also uses weaker learners like shallow decision trees to make predictions. Additionally, where the name comes from, models boost the importance of training examples to focus the model on mproving misclassified data.
diff --git a/ExtraTrees.md b/ExtraTrees.md
@@ -0,0 +1,10 @@
+:ml:
+# Extra Trees (Extremely randomized trees)
+
+ML D5
+
+## Notes
+
+**Definition:** Extra trees are decisions trees that incorporate extra randomness by randomizing splitting thresholds instead of using gini impurity of information gain to determine splitting thresholds.
+
+Basically, each leaf selects a random feature and then selects a random value that is in the set of valid inputs for the node and splits upon that. This adds lots of randomness and greatly reduces training time because the optimal split at each point in time does not need to be calculated. 
diff --git a/GradientBoosting.md b/GradientBoosting.md
@@ -0,0 +1,11 @@
+# Gradient Boosting
+
+ML D5
+
+## Notes
+
+**Definition:** Gradient boosting sequentially adds predictors to an ensemble and fits subsequent models not by instance weights like adaboosting but by residual errors.
+
+Residual errors are simply the difference between expected and predicted values. As such, gradient boosting does not use weighting in the same way as adaboosting thus distinguishing the two. It basically tries to predict the error amounts from the prior model and output what it thinks they will be.
+
+Gradient boosting generally uses stronger learners than adaboosting as this works better with the architecture.
diff --git a/HistogramBasedGradientBoosting.md b/HistogramBasedGradientBoosting.md
@@ -0,0 +1,12 @@
+:ml:
+# Histogram Based Gradient Boosting (HGB)
+
+ML D5
+
+## Notes
+
+**Definition:** Histogram based gradient boosting is an implementation of gradient boosting that uses binning of input features.
+
+This is much faster than normal gradient boosting. Also, the normal way of doing this is by rounding to integers for values.
+
+This is hundreds of times faster in training than gradient boosting on large datasets. At the cost of precision. With that said, binning also acts as a regularizer to help reduce overfitting.
diff --git a/MachineLearning.md b/MachineLearning.md
@@ -112,6 +112,18 @@ Concepts:
 [[Bias.md]]
 [[Variance.md]]
 [[OutOfBag.md]]
+[[RandomPatches.md]]
+[[RandomSubspaces.md]]
+[[ExtraTrees.md]]
+[[Boosting.md]]
+[[AdaBoost.md]]
+[[GradientBoosting.md]]
+[[HistogramBasedGradientBoosting.md]]
+[[Stacking.md]]
+[[Projection.md]]
+[[Subspace.md]]
+[[ManifoldLearning.md]]
+[[PCA.md]]
 
 To do:
 
diff --git a/ManifoldLearning.md b/ManifoldLearning.md
@@ -0,0 +1,11 @@
+# Manifold Learning
+
+ML D5
+
+## Notes
+
+**Definition:** Manifold learning is the process of mapping a 3D object to a 2D manifold.
+
+Manifolds are representations of objects in higher dimensional space using lower dimensional space such that they still maintain attributes. This can be thought of like uv wrapping.
+
+This is often used when projection would cause multiple layers of values to be projected into nearby values which can cause issues.
diff --git a/OutOfBag.md b/OutOfBag.md
@@ -1,5 +1,5 @@
 :ml:
-# Out of Bag
+# Out of Bag (OOB)
 
 ML D5
 
diff --git a/PCA.md b/PCA.md
@@ -0,0 +1,12 @@
+:ml:
+# PCA (Principal Component Analysis)
+
+ML D5
+
+## Notes
+
+**Definition:** PCA is a dimensionality reduction algorithm that finds a hyperplane that lies close to the data and then projects the data onto it.
+
+The goal of this algorithm is to preserve maximum variance so values in the dataset are optimally spread out.
+
+The way to describe this as a cost function would be to minimize the mean squared distance between the original dataset and the projected position.
diff --git a/Projection.md b/Projection.md
@@ -0,0 +1,10 @@
+:ml:
+# Projection
+
+ML D5
+
+## Notes
+
+**Definition:** Projection is the process of moving an element from higher dimensional space to a lower dimensional space. 
+
+Projection is often used to reduce dimensionallity of datasets as data becomes more sparse in higher dimensions.
diff --git a/RandomForest.md b/RandomForest.md
@@ -9,4 +9,4 @@ ML D4
 
 This uses a wisdom of the crowd philosophy where most likely the aggregated sum of many answers is better than one expert answer.
 
-
+Random forests are normally trained with [[Bagging.md]] and sometimes with [[Pasting.md]]. 
diff --git a/RandomPatches.md b/RandomPatches.md
@@ -0,0 +1,12 @@
+:ml:
+# Random Patches (Method)
+
+ML D5
+
+## Notes
+
+**Definition:** The random patches method for random sampling uses bagging (sometimes pasting) as well as selecting a random subset of features.
+
+This ensures both a random subset of samples and a random set of features. This reduces variance but increases bias.
+
+This is useful for high-dimensional inputs like images that take a long time to train models.
diff --git a/RandomSubspaces.md b/RandomSubspaces.md
@@ -0,0 +1,8 @@
+:ml:
+# Random Subspaces Method
+
+ML D5
+
+## Notes
+
+**Definition:** The random subspaces method is similar to [[RandomPatches.md]] except it keeps all training instances and only samples features.
diff --git a/Stacking.md b/Stacking.md
@@ -0,0 +1,14 @@
+:ml:
+# Stacking
+
+ML D5
+
+## Notes
+
+**Definition:** Stacking is the idea that we should create a dedicated model to act as a voting machine for an ensemble of predictive models.  
+
+This is in contrast with soft and hard voting which does simple calculations to determine the output based on inputs from the outputs of predictors (I know lots of words).
+
+This models that does the final prediction is called a blender or meta learner and through the process of blending gives and output.
+
+A good way to do this is by training the model on the outputs of out of sample data for all prior models.
diff --git a/Statistics.md b/Statistics.md
@@ -32,3 +32,4 @@ Links to Stats Notes
 [[Boxplots.md]]
 [[Crosstabulation.md]]
 [[MosaicPlot.md]]
+[[BayesianInference.md]]
diff --git a/Subspace.md b/Subspace.md
@@ -0,0 +1,10 @@
+:ml:
+# Subspace
+
+ML D5
+
+## Notes
+
+**Definition:** A subspace is a lower dimensional space.
+
+Often we find that many higher dimensional points all reside in or near a similar lower dimensional subspace which is the basis for [[Projection.md]]
diff --git a/Variance.md b/Variance.md
@@ -20,3 +20,5 @@ If we take the square root of the variance we then have the [[StandardDeviation.
 **Definition:** Variance is error cause by an oversensitive model (sensitive to variance/outliers).
 
 These models are likely to overfit training data.
+
+Variance can be thought of as a models susceptibility to having vast differences based on training data differences. This is what is tested for when doing cross validation.

	notes Unnamed repository; edit this file 'description' to name the repository.
	Log \| Files \| Refs

A	AdaBoost.md	\|	10	++++++++++
A	BayesianInference.md	\|	10	++++++++++
A	Boosting.md	\|	16	++++++++++++++++
A	ExtraTrees.md	\|	10	++++++++++
A	GradientBoosting.md	\|	11	+++++++++++
A	HistogramBasedGradientBoosting.md	\|	12	++++++++++++
M	MachineLearning.md	\|	12	++++++++++++
A	ManifoldLearning.md	\|	11	+++++++++++
M	OutOfBag.md	\|	2	+-
A	PCA.md	\|	12	++++++++++++
A	Projection.md	\|	10	++++++++++
M	RandomForest.md	\|	2	+-
A	RandomPatches.md	\|	12	++++++++++++
A	RandomSubspaces.md	\|	8	++++++++
A	Stacking.md	\|	14	++++++++++++++
M	Statistics.md	\|	1	+
A	Subspace.md	\|	10	++++++++++
M	Variance.md	\|	2	++