commit 5afcd29c8707a7b9549827c1ae109e38d627b67f
parent e1ab383509ea71e9b938862a8f85bb86112a1574
Author: Andrew <andrewlaack1@gmail.com>
Date: Tue, 11 Jun 2024 11:06:34 -0500
Added notes about categorical cross entropy, wide and deep nns, and some other dl stuff
Diffstat:
11 files changed, 116 insertions(+), 4 deletions(-)
diff --git a/Backpropagation.md b/Backpropagation.md
@@ -1 +1,18 @@
-:todo:
+:ml:
+# Backpropogation
+
+ML D6
+
+## Notes
+
+**Definition:** Backpropagation is the combination of reverse-mode autodiff and gradient descent to iteratively improve models based on expected outputs by given inputs by following the gradient for each [[Weight.md]] and [[Bias.md]].
+
+When using backpropogation we use many mini-batches. Generally we go through the entire dataset to train multiple times and these passes are called epochs. When using mini-batches we first find the values from the input layer for each input, then we go to the second layer, and so on until reaching the output layer. This is the forward pass stage. An important note is that all intermediate values must be preserved to ensure we can do the backward pass.
+
+Once the forward pass is completed we then compute a loss function to find the output error.
+
+Next, we compute how much each bias and connection contributed to this error moving are way backwards from the output layer to the input layer. This is done using the chain rule.
+
+Lastly, using these error gradients, we do a gradient descent step to tweak the connection weights and biases.
+
+When doing backpropogration we should replace the MLP's step function with a function that does not have a derivative of 0 in all places (ReLU, Sigmoid, etc) to ensure gradient descent steps can be made. This is referred to as the activation function.
diff --git a/Bias.md b/Bias.md
@@ -1,12 +1,19 @@
-:ml:
+:ml: :stats:
# Bias
ML D5
## Notes
+### Stats
+
**Definition:** Bias is a generalization error caused by incorrect assumptions such as assuming data is linear when it is not.
High bias models are likely to underfit training data.
See also [[Variance.md]]
+
+
+### ANNs
+
+**Definition:** Biases in ANNs are constants used as additional inputs for each perceptron (neuron). This can be thought of like y-intercepts for linear equations.
diff --git a/CategoricalCrossEntropy.md b/CategoricalCrossEntropy.md
@@ -0,0 +1,14 @@
+:ml:
+# Categorical Cross Entropy
+
+ML D6
+
+## Notes
+
+**Definition:** Categorical cross entropy is a loss calculation used for classification algorithms.
+
+Categorical cross entropy is calculated by summing the log of y_i log(p_i) and multiplying by -1 where y_i is the expected classification (1 is true 0 false) and p_i is the probability output of the model.
+
+In essence, this is the negative sum of the logs of all probability outputs where the input should be a part of the class. All other classes are ignored so if another class has a .8 probability output it is multiplied by 0 thus not having an effect on the categorical cross entropy of the model.
+
+Cross entropy is the idea that we want to have the difference between the true probability and the estimated probability. This can be stated more complexly, but in the end it always uses logs.
diff --git a/DeepLearning.md b/DeepLearning.md
@@ -1 +0,0 @@
-:todo:
diff --git a/GaussianMixtureModels.md b/GaussianMixtureModels.md
@@ -0,0 +1,8 @@
+:ml:
+# Gaussian Mixture Models
+
+ML D5
+
+## Notes
+
+**Definition:** Gaussian mixture models (GMMs) are probabilistic models that assume instances were generated using several gaussian distributions where each distribution forms its own cluster.
diff --git a/MLP.md b/MLP.md
@@ -0,0 +1,13 @@
+# MLP(s)
+
+ML D6
+
+## Notes
+
+**Definition:** Multilayer perceptrons are a form of deep neural network that are a feedforward process where each output goes forward to the next layer of perceptrons until reaching the output layer.
+
+MLPs can do regression and classification tasks. For regression we need one output for each output feature we would like to predict. With these outputs we can also apply an activation function (default is none), to bound the output range.
+
+For classification tasks you need to dedicate one output neuron for each class. These classes then use a sigmoid activation function that determines the probability of class membership. To get an output with a sum of 1 (wanted in the case of multiclass classification where only one output is expected) we can use a softmax function for each output.
+
+For classification tasks with neural networks we generally want to minimize cross entropy rather than MSE which is the normal metric for regression. Cross entropy is the difference between the predicted distribution and the true distribution. This is also used for logistic regression.
diff --git a/MachineLearning.md b/MachineLearning.md
@@ -129,4 +129,10 @@ Concepts:
[[Affinity.md]]
[[Segmentation.md]]
[[DBSCAN.md]]
-
+[[GaussianMixtureModels.md]]
+[[NeuralNetworks.md]]
+[[Perceptrons.md]]
+[[Backpropagation.md]]
+[[MLP.md]]
+[[WideAndDeepNN.md]]
+[[CategoricalCrossEntropy.md]]
diff --git a/NeuralNetworks.md b/NeuralNetworks.md
@@ -0,0 +1,10 @@
+:ml:
+# (Artificial) Neural Networks (ANNs)
+
+ML D5
+
+## Notes
+
+**Definition:** Artificial neural networks are machine learning models that mimick biological neurons to complete some task.
+
+ReLU activations can be used on output layers to force the output to be positive. Additionally, we can use softplus which is relu but smooth to set output values because by default there is not an activation function for the output layer.
diff --git a/Perceptrons.md b/Perceptrons.md
@@ -0,0 +1,18 @@
+:ml:
+# Perceptrons
+
+ML D5
+
+## Notes
+
+**Definition:** Perceptrons are an artificial neural network architecture based on threshold logic untis (TLUs) or linear threshold units (LTUs).
+
+The inputs and outputs of these neurons are numbers and each input is associated with a weight.
+
+The neuron accepts inputs and computes a linear function on them. It then uses a step function to get the output.
+
+Perceptrons are a single layer neural network where the inputs are taken in and they are connected to each neuron. These neurons are then the output layer.
+
+If there are multiple layers of perceptrons these are then called MLPs (multilayer perceptrons).
+
+MLPs are a type of deep neural networks, but deep neural networks also include things like recurrent neural networks whereas MLPs are only feed forward.
diff --git a/Weight.md b/Weight.md
@@ -0,0 +1,8 @@
+:ml:
+# Weight (ANNs)
+
+ML D6
+
+## Notes
+
+**Definition:** Weights in ANNs are numerical values that represent the strength of connections between neurons (perceptrons).
diff --git a/WideAndDeepNN.md b/WideAndDeepNN.md
@@ -0,0 +1,12 @@
+:ml:
+# Wide and Deep Neural Network
+
+ML D6
+
+## Notes
+
+**Definition:** Wide and deep neural networks are a model architecture where some or all inputs are connected directly to outputs while also having a path through the neural network through hidden layers.
+
+By using a wide and deep neural network we don't worry about muddying simple relationships through the long path of a neural network as some values will be automatically factored into the outputs.
+
+When doing this you use a concatenation layer prior to the output layer to combine both the long and short paths together.