Updated links - notes - Unnamed repository; edit this file 'description' to name the repository.

commit 54e5cff2850ea7046491c0bc572edfaa549407cf
parent a372eef695d5c8f90804d9e0ffe715f2c49a2b53
Author: AndrewLockVI <andrewlaack1@gmail.com>
Date:   Tue,  4 Feb 2025 07:40:19 -0600

Updated links

Diffstat:
M docs/AbstractDataType.md  | 2 +-
M docs/Adam.md  | 4 +---
M docs/Algorithms.md  | 2 +-
M docs/AngleBetweenVectors.md  | 2 +-
M docs/Animation.md  | 8 ++++----
M docs/AnimationController.md  | 2 +-
M docs/Armature.md  | 2 +-
M docs/Backpropagation.md  | 2 +-
M docs/BarrierSynchronization.md  | 4 ++--
R docs/BayesTheroem.md -> docs/BayesTheorem.md  | 0 
M docs/Bias.md  | 2 +-
M docs/Biconditional.md  | 2 +-
M docs/Bijective.md  | 4 +---
M docs/BitSteering.md  | 4 ++--
M docs/Blender.md  | 22 +++++++++-------------
M docs/BreadthFirstSearch.md  | 4 ++--
M docs/BulkSynchronousProcessing.md  | 2 +-
M docs/CNN.md  | 2 +-
M docs/Calculus.md  | 20 ++++++++++----------
R docs/CentralLimitTheroem.md -> docs/CentralLimitTheorem.md  | 0 
M docs/CircularDoublyLinkedList.md  | 5 ++---
M docs/CircularLinkedList.md  | 3 +--
M docs/ClassificationProblem.md  | 2 +-
M docs/Clip.md  | 2 +-
M docs/Closure.md  | 2 +-
M docs/ClusteringAlgorithms.md  | 6 ++----
M docs/Codomain.md  | 2 +-
M docs/Complement.md  | 2 +-
A docs/ConditionalProbabilityTheorem.md  | 11 +++++++++++
D docs/ConditionalProbabilityTheroem.md  | 13 -------------
M docs/ContinuousProbability.md  | 2 +-
M docs/Contrapositive.md  | 2 +-
M docs/Correlation.md  | 2 +-
M docs/CounterExample.md  | 2 +-
M docs/Covariance.md  | 2 +-
M docs/CramersRule.md  | 2 +-
M docs/Crosstabulation.md  | 2 +-
M docs/DRAM.md  | 6 +++---
M docs/DRAMBanks.md  | 2 +-
M docs/DRAMCell.md  | 4 +---
M docs/DRAMChips.md  | 2 +-
M docs/DRAMRefresh.md  | 4 ++--
M docs/DRAMRowHammer.md  | 2 +-
M docs/DataFlow.md  | 7 +++----
M docs/Degree.md  | 4 +---
M docs/DemorgansLaw.md  | 2 +-
M docs/DepthFirstSearch.md  | 6 ++----
M docs/Determinant.md  | 2 +-
M docs/DiscreteUniformLaw.md  | 2 +-
M docs/DisturbanceErrors.md  | 6 ++----
M docs/ExplodingGradients.md  | 4 ++--
M docs/ExponentialDistribution.md  | 4 +---
M docs/FeatureScaling.md  | 2 +-
M docs/ForwardThoughts.md  | 8 +++-----
M docs/Frequency.md  | 2 +-
M docs/FundamentalTheoremOfArithmetic.md  | 4 +---
M docs/GameLoop.md  | 2 +-
M docs/GameObject.md  | 2 +-
M docs/GaussianElimination.md  | 4 ++--
M docs/GradientClipping.md  | 2 +-
M docs/GradientDescent.md  | 8 ++++----
M docs/GradientDescentCode.md  | 2 +-
M docs/Graphs.md  | 4 +---
M docs/HarmonicMean.md  | 4 ++--
M docs/Homogeneous.md  | 4 +---
M docs/Hyperparameter.md  | 2 +-
M docs/Hyperplane.md  | 4 +---
M docs/Hypervolume.md  | 2 +-
M docs/ISA.md  | 8 ++++----
M docs/Image.md  | 2 +-
M docs/Incremental.md  | 4 +---
M docs/Induction.md  | 8 ++++----
M docs/Inertia.md  | 2 +-
M docs/Inference.md  | 2 +-
M docs/Inhomogeneous.md  | 4 +---
M docs/Instruction.md  | 4 ++--
M docs/InverseTransformation.md  | 4 ++--
M docs/KMeans.md  | 4 +---
M docs/Kernel.md  | 2 +-
M docs/KeyframeAnimation.md  | 2 +-
M docs/LabelEncoding.md  | 6 ++----
M docs/LawOfLargeNumbers.md  | 4 +---
M docs/LearningRate.md  | 4 ++--
M docs/LinearIndependence.md  | 2 +-
M docs/LinearRegression.md  | 4 ++--
M docs/LinkedLists.md  | 10 ++++------
M docs/LocalScale.md  | 2 +-
M docs/LogisticRegression.md  | 2 +-
M docs/LoopInvariant.md  | 2 +-
M docs/MAE.md  | 2 +-
M docs/MarkovChains.md  | 4 ++--
M docs/MarkovProcess.md  | 4 +---
M docs/MatrixMultiplication.md  | 2 +-
M docs/Memory.md  | 14 +++++++-------
M docs/MemoryManagement.md  | 2 +-
M docs/MergeSort.md  | 2 +-
M docs/Mesh.md  | 4 ++--
M docs/MicroArchitecture.md  | 6 +++---
M docs/MinMaxScaling.md  | 2 +-
M docs/MixedRandomVariable.md  | 4 ++--
M docs/ModelBasedLearning.md  | 2 +-
M docs/MonoBehaviour.md  | 2 +-
M docs/Movement.md  | 6 +++---
M docs/MultilabelClassification.md  | 2 --
M docs/NaryOperations.md  | 2 --
M docs/NoveltyDetection.md  | 2 +-
M docs/Nullity.md  | 4 +---
M docs/OfflineLearning.md  | 2 +-
M docs/OneHotEncoding.md  | 2 +-
M docs/OneVersusAll.md  | 2 +-
M docs/OneVersusOne.md  | 2 +-
M docs/OnlineLearning.md  | 4 ++--
M docs/Opcode.md  | 4 ++--
M docs/Operands.md  | 4 ++--
M docs/Optimizer.md  | 8 ++++----
M docs/OrdinaryLeastSquares.md  | 2 +-
M docs/OrthogonalComplement.md  | 2 +-
M docs/OutOfOrderExecution.md  | 2 +-
M docs/Overfitting.md  | 2 +-
R docs/Oversmooothing.md -> docs/Oversmoothing.md  | 0 
M docs/PartialDerivative.md  | 2 +-
M docs/Partition.md  | 2 +-
M docs/Pipelining.md  | 2 +-
M docs/PlaneToPlaneDistance.md  | 4 ++--
M docs/PoissonProcess.md  | 4 ++--
M docs/Pole.md  | 2 +-
M docs/Postcondition.md  | 2 +-
M docs/Prediction.md  | 2 +-
M docs/Preimage.md  | 2 +-
M docs/PretrainedModels.md  | 2 +-
M docs/Probability.md  | 4 ++--
M docs/ProbabilityDensityFunctions.md  | 2 +-
M docs/ProbabilityLaw.md  | 4 +---
M docs/ProbabilityMassFunction.md  | 4 ++--
M docs/ProgrammerVisibleState.md  | 4 ++--
M docs/Quaternions.md  | 2 +-
M docs/Queue.md  | 4 ++--
M docs/RMSE.md  | 2 +-
M docs/RandomExperiment.md  | 2 +-
M docs/RandomForest.md  | 4 ++--
M docs/RandomSubspaces.md  | 2 +-
M docs/RandomVariables.md  | 2 +-
M docs/RegressionProblem.md  | 6 +++---
M docs/RelativeFrequency.md  | 4 ++--
M docs/Rotate.md  | 4 ++--
M docs/Rotation.md  | 2 +-
M docs/RowBuffer.md  | 2 +-
M docs/RuleLearning.md  | 2 +-
M docs/RuleOfSarrus.md  | 2 +-
M docs/Scheduling.md  | 2 +-
M docs/SelfSupervisedLearning.md  | 2 +-
M docs/SinglyLinkedList.md  | 4 ++--
M docs/SkeletalAnimation.md  | 6 +++---
M docs/SmallestCounterExample.md  | 2 +-
M docs/Stack.md  | 4 ++--
M docs/StandardDeviation.md  | 2 +-
M docs/Standardization.md  | 6 +++---
M docs/StatisticsAndProbability.md  | 217 ++++++++++++++++++++++++++++++++++++++++++++++---------------------------------
M docs/StochasticAlgorithm.md  | 2 +-
M docs/StrongInduction.md  | 9 ++-------
M docs/Subspace.md  | 2 +-
M docs/SurfaceRepresentation.md  | 2 +-
M docs/TargetEncoding.md  | 4 ++--
M docs/Texture.md  | 6 +++---
M docs/TextureMaps.md  | 2 +-
R docs/TotalProbabilityTheroem.md -> docs/TotalProbabilityTheorem.md  | 0 
M docs/Tractable.md  | 2 +-
M docs/Transform.md  | 4 ++--
M docs/Transformations.md  | 2 +-
M docs/Translate.md  | 4 ++--
M docs/Tree.md  | 6 +++---
M docs/Triangulation.md  | 2 +-
M docs/TwosComplement.md  | 2 +-
M docs/Underfitting.md  | 2 +-
M docs/Universe.md  | 2 +-
M docs/UnstableGradients.md  | 2 +-
M docs/UnsupervisedLearning.md  | 6 ++----
M docs/UnsupervisedPretraining.md  | 2 +-
M docs/VanishingGradients.md  | 4 ++--
M docs/Variance.md  | 2 +-
M docs/Vector3.md  | 2 +-
M docs/VisualizationAlgorithm.md  | 2 +-
M docs/VonNeumannModel.md  | 2 +-
M docs/WellOrdered.md  | 2 +-

184 files changed, 413 insertions(+), 440 deletions(-)
diff --git a/docs/AbstractDataType.md b/docs/AbstractDataType.md
@@ -4,6 +4,6 @@ CS 202 L14
 
 
 
-**Definition:** An ADT is a datatype that specifies it's interfaces but not implementation. This is similar to the relationship between an [[ISA.md]] and [[MicroArchitecture.md]].
+**Definition:** An ADT is a datatype that specifies it's interfaces but not implementation. This is similar to the relationship between an [ISA](ISA.md) and [MicroArchitecture](MicroArchitecture.md)
 
 These are a focus of CS 303 and include things such as [Stack](Stack.md) and [Queue](Queue.md).
diff --git a/docs/Adam.md b/docs/Adam.md
@@ -2,10 +2,8 @@
 
 ML P587
 
-
-
 **Definition:** Adam combines momentum with RMSProp to calculate gradients based on momentum and historical gradients.
 
 This is the best in most cases.
 
-There are variants of adam as well such as AdaMax (generally worse), Nadam (uses [[NAG.md]] idea for calculating in direction of momentum and generally outperforms adam), AdamW (regularized with weight decay).
+There are variants of adam as well such as AdaMax (generally worse), Nadam (uses [NAG](NAG.md) idea for calculating in direction of momentum and generally outperforms adam), AdamW (regularized with weight decay).
diff --git a/docs/Algorithms.md b/docs/Algorithms.md
@@ -1,6 +1,6 @@
 # Algorithms Index
 
-This is an index for links to notes taken about algorithms. These are CS related algorithms and not related to machine learning (see [[MachineLearning.md]] for that).
+This is an index for links to notes taken about algorithms. These are CS related algorithms and not related to machine learning (see [Machine Learning](MachineLearning.md) for that).
 
 ## Links
 
diff --git a/docs/AngleBetweenVectors.md b/docs/AngleBetweenVectors.md
@@ -8,6 +8,6 @@ Khan
 
 ## Calculation
 
-1. Find magnitude of both vectors ([[DistanceCalculation.md]]) 
+1. Find magnitude of both vectors ([Distance Calculation](DistanceCalculation.md)) 
 2. Take dot product divided by lengths of vectors to find cosine of the angle (solve)
 	- cos(theta) = (u dot v)/(||u||||v||)
diff --git a/docs/Animation.md b/docs/Animation.md
@@ -6,16 +6,16 @@ CG W13 L3
 
 **Definition:** Animation is the process of making still images appear as continuous movement.
 
-Unity uses [[Clip.md]] for simple animations. These are pre-defined and repetitive in nature. Think of repeated falling animations and not rigidbody falling animations which would not be predefined. 
+Unity uses [Clip](Clip.md) for simple animations. These are pre-defined and repetitive in nature. Think of repeated falling animations and not rigidbody falling animations which would not be predefined. 
 
 These clips can now be stored as a 3d model with parameters that are adjusted instead of images or anything of the sort. 
 
-Unity uses the [[Animation.md]] class as well as [[AnimationController.md]] to control animations.
+Unity uses the [Animation](Animation.md) class as well as [Animation Controller](AnimationController.md) to control animations.
 
 Unity Class:
 
-There is also a unity class which is a component named animation. This class describes a specific animation. This is called by the [[AnimationController.md]]
+There is also a unity class which is a component named animation. This class describes a specific animation. This is called by the [Animation Controller](AnimationController.md).
 
 Blender Stuff:
 
-See [[KeyframeAnimation.md]] for animating in blender.
+See [Keyframe Animation](KeyframeAnimation.md) for animating in blender.
diff --git a/docs/AnimationController.md b/docs/AnimationController.md
@@ -7,6 +7,6 @@ CG W13 L3
 **Definition:** An animation controller is a finite state machine that can be represented as a graph where the verticies are states and the edges are transitions between states. Note that this is a directed graph.
 
 
-See [[Animation.md]] for individual animation class.
+See [Animation](Animation.md) for individual animation class.
 
 This is an observer architecture where it observes things and calls secondary actions namely animation classes.
diff --git a/docs/Armature.md b/docs/Armature.md
@@ -7,4 +7,4 @@
 
 An armature also has a default pose (generally t pose), which is the state of all it's bones when imported (transform). 
 
-See [[SkeletalAnimation.md]] for more.
+See [Skeletal Animation](SkeletalAnimation.md) for more.
diff --git a/docs/Backpropagation.md b/docs/Backpropagation.md
@@ -4,7 +4,7 @@ ML D6
 
 
 
-**Definition:** Backpropagation is the combination of reverse-mode autodiff and gradient descent to iteratively improve models based on expected outputs by given inputs by following the gradient for each [[Weight.md]] and [[Bias.md]].
+**Definition:** Backpropagation is the combination of reverse-mode autodiff and gradient descent to iteratively improve models based on expected outputs by given inputs by following the gradient for each [Weight](Weight.md) and [Bias](Bias.md).
 
 When using backpropogation we use many mini-batches. Generally we go through the entire dataset to train multiple times and these passes are called epochs. When using mini-batches we first find the values from the input layer for each input, then we go to the second layer, and so on until reaching the output layer. This is the forward pass stage. An important note is that all intermediate values must be preserved to ensure we can do the backward pass.
 
diff --git a/docs/BarrierSynchronization.md b/docs/BarrierSynchronization.md
@@ -3,7 +3,7 @@ Computer Architecture L2
 
 
 
-**Definition:** This is a way to block all execution until all inputs are ready. This can be thought of as thread syncing and is closely related to [[DataFlow.md]] execution.
+**Definition:** This is a way to block all execution until all inputs are ready. This can be thought of as thread syncing and is closely related to [Data Flow](DataFlow.md) execution.
 
 
 in1  in2  in3
@@ -14,4 +14,4 @@ out1 out2 out3
 
 in this image, all in's need to be ready before out's are assigned. 
 
-See [[BulkSynchronousProcessing.md]] for more of the same information. Bulk synchronous processing is the idea of processing lots of things in parallel before moving on.  
+See [Bulk Synchronous Processing](BulkSynchronousProcessing.md) for more of the same information. Bulk synchronous processing is the idea of processing lots of things in parallel before moving on.  
diff --git a/docs/BayesTheroem.md b/docs/BayesTheorem.md
diff --git a/docs/Bias.md b/docs/Bias.md
@@ -10,7 +10,7 @@ ML D5
 
 High bias models are likely to underfit training data.
 
-See also [[Variance.md]]
+See also [Variance](Variance.md)
 
 
 ### ANNs
diff --git a/docs/Biconditional.md b/docs/Biconditional.md
@@ -4,7 +4,7 @@
 
 
 
-**Definition:** The biconditional is the [[Connectives.md]] that states the antecedent and consequent have the same truth values.
+**Definition:** The biconditional is the [Connectives](Connectives.md) that states the antecedent and consequent have the same truth values.
 
 $p \iff q$ this can be stated as p iff q, if and only if p then q, or some other way.
 
diff --git a/docs/Bijective.md b/docs/Bijective.md
@@ -2,9 +2,7 @@
 
 L2
 
-
-
-**Definition:** For a function to be bijective it must be both [[Surjective.md]] and [[Injective.md]].
+**Definition:** For a function to be bijective it must be both [Surjective](Surjective.md) and [Injective](Injective.md).
 
 This means that each value in the domain maps to a unique value in the codomain (Injective) and each value in the codomain is mapped to at least once (Surjective).
 
diff --git a/docs/BitSteering.md b/docs/BitSteering.md
@@ -6,6 +6,6 @@ CA L3
 
 **Definition:** This is a bit in an instruction that determines how later bits are interpreted. 
 
-A good example of this is an [[Opcode.md]]
+A good example of this is an [Opcode](Opcode.md).
 
-There are also other examples including Alpha's ([[ISA.md]]) ADD instruction which allows for permutations of the ADD instruction based on a bit passed to it as part of the instruction.
+There are also other examples including Alpha's ([ISA](ISA.md)) ADD instruction which allows for permutations of the ADD instruction based on a bit passed to it as part of the instruction.
diff --git a/docs/Blender.md b/docs/Blender.md
@@ -2,22 +2,18 @@
 
 CS331 W12 L3
 
-
-
-The default file format is FBX (Filmbox) which can be imported into [[Unity.md]].
+The default file format is FBX (Filmbox) which can be imported into [Unity](Unity.md)
 
 ## Links
 
-[[Mesh.md]]
-[[Pole.md]]
-[[UVMaps.md]]
-[[Animation.md]]
-[[KeyframeAnimation.md]]
-[[SkeletalAnimation.md]]
-[[BlenderShortcuts.md]]
+- [Mesh](Mesh.md)
+- [Pole](Pole.md)
+- [UVMaps](UVMaps.md)
+- [Animation](Animation.md)
+- [Keyframe Animation](KeyframeAnimation.md)
+- [Skeletal Animation](SkeletalAnimation.md)
+- [BlenderShortcuts](BlenderShortcuts.md)
 
 ## To Do
 
-[[Seam.md]]
-
-
+- [Seam](Seam.md)
diff --git a/docs/BreadthFirstSearch.md b/docs/BreadthFirstSearch.md
@@ -4,6 +4,6 @@ CS 202 L14
 
 
 
-**Definition:** Search algorithm that moves its way outward from the root node. This is different than [[DepthFirstSearch.md]] as it does not go all the way down and then search but instead moves away from the root.
+**Definition:** Search algorithm that moves its way outward from the root node. This is different than [DepthFirstSearch](DepthFirstSearch.md) as it does not go all the way down and then search but instead moves away from the root.
 
-This uses a [[Queue.md]] to search.
+This uses a [Queue](Queue.md) to search.
diff --git a/docs/BulkSynchronousProcessing.md b/docs/BulkSynchronousProcessing.md
@@ -4,7 +4,7 @@ CA L2
 
 
 
-**Definition:** Completing parallel processing and then using [[BarrierSynchronization.md]] to join together threads of execution. 
+**Definition:** Completing parallel processing and then using [BarrierSynchronization](BarrierSynchronization.md) to join together threads of execution. 
 
 This is called bulk because it can be done all concurrently while also having synchronization in the form of a thread join. 
 
diff --git a/docs/CNN.md b/docs/CNN.md
@@ -12,7 +12,7 @@ CNNs are good for image detection because they retain information about pixels a
 
 When using CNNs it is a good idea to start at around 32 filters (or higher) and increase (often double) the number of filters as the layers progress. 
 
-Additionally, don't forget to add [[MaxPooling.md]] to ensure features are compressed and complexity of the model is minimized.
+Additionally, don't forget to add [MaxPooling](MaxPooling.md) to ensure features are compressed and complexity of the model is minimized.
 
 ### Typical Form
 
diff --git a/docs/Calculus.md b/docs/Calculus.md
@@ -6,21 +6,21 @@
 Calc 2 (Leonard):
 
 L1:
-	- [[NaturalLog.md]]
-	- [[ProductRule.md]]
-	- [[ChainRule.md]]
-	- [[LogarithmicDifferentiation.md]]
+	- [Natural Log](NaturalLog.md)
+	- [Product Rule](ProductRule.md)
+	- [Chain Rule](ChainRule.md)
+	- [Logarithmic Differentiation](LogarithmicDifferentiation.md)
 
 L2:
-	- [[InverseFunction.md]]
-	- [[Injective.md]]
-	- [[Surjective.md]]
-	- [[Bijective.md]]
+	- [Inverse Function](InverseFunction.md)
+	- [Injective](Injective.md)
+	- [Surjective](Surjective.md)
+	- [Bijective](Bijective.md)
 
 Khan Calc 2:
 
 Unit 1:
-	- [[FundamentalTheroemofCalculus.md]]
+	- [FundamentalTheroemofCalculus](FundamentalTheroemofCalculus.md)
 
 Unit 2:
 	- [usubstitution](usubstitution.md)
@@ -29,7 +29,7 @@ Unit 2:
 Calculus Early Transcendentals JS:
 
 Section 2.8:
-	- [[Jerk.md]]
+	- [Jerk](Jerk.md)
 
 ## Known Integrals
 
diff --git a/docs/CentralLimitTheroem.md b/docs/CentralLimitTheorem.md
diff --git a/docs/CircularDoublyLinkedList.md b/docs/CircularDoublyLinkedList.md
@@ -1,8 +1,7 @@
+# Circular Doubly Linked List
 
 CS202 L14
 
-
-
 **Definition:** This is a doubly linked list where the last pointer points to the first and the first pointer of the first element points to the last.
 
-Can be used wherever [[CircularLinkedList.md]]s are used and are better when bi-directional movement is required. I am having trouble thinking of when this would ever be useful.
+Can be used wherever [CircularLinkedList](CircularLinkedList.md) are used and are better when bi-directional movement is required. I am having trouble thinking of when this would ever be useful.
diff --git a/docs/CircularLinkedList.md b/docs/CircularLinkedList.md
@@ -1,8 +1,7 @@
+# Circular Linked List
 
 CS202 L14
 
-
-
 **Definition:** This is a singly linked list where the last node points back to the first node. 
 
 This could be useful when implementing OS based threads as you would need to cycle through threads of execution when one thread gets blocked. There are not many other uses for this datastructure beyond this. 
diff --git a/docs/ClassificationProblem.md b/docs/ClassificationProblem.md
@@ -8,4 +8,4 @@ ML 1
 
 In other words, if there is a finite set of possible outcomes, it is a classification problem. Oftentimes this manifests as yes/no, but also could include much larger sets of possible values. 
 
-The alternative to this would be a [[RegressionProblem.md]] where the output is a continuous set of values. 
+The alternative to this would be a [RegressionProblem](RegressionProblem.md) where the output is a continuous set of values. 
diff --git a/docs/Clip.md b/docs/Clip.md
@@ -8,4 +8,4 @@ CG W13 L3
 
 This can be thought of in a similar way to tiling where the start and end should be the same and then repeated over and over again.
 
-See [[Animation.md]] for more.
+See [Animation](Animation.md) for more.
diff --git a/docs/Closure.md b/docs/Closure.md
@@ -6,7 +6,7 @@ Khan
 
 **Definition:** Closure means that performing some arbitrary operation (pick one, but not necessarily all) on any member of a set will result in another element of a set. 
 
-In the context of subspaces, we have closure under scalar multiplication and vector addition because these operations on any element of the [[LinearSubspace.md]] set results in another element of the set (by definition).
+In the context of subspaces, we have closure under scalar multiplication and vector addition because these operations on any element of the [LinearSubspace](LinearSubspace.md) set results in another element of the set (by definition).
 
 ## Discrete Math
 
diff --git a/docs/ClusteringAlgorithms.md b/docs/ClusteringAlgorithms.md
@@ -2,14 +2,12 @@
 
 ML L1
 
-
-
 **Definition:** An algorithm that groups data together with other like items. 
 
 Think google news group related stories amongst other things. 
 
-This is often done via [[UnsupervisedLearning.md]]
+This is often done via [UnsupervisedLearning](UnsupervisedLearning.md)
 
-An important distinction between this and [[ClassificationProblem.md]] is that clustering algorithms don't know the groupings before hand and are unsupervised. This means they know certain samples are similar, but does not have a term to describe said membership. 
+An important distinction between this and [ClassificationProblem](ClassificationProblem.md) is that clustering algorithms don't know the groupings before hand and are unsupervised. This means they know certain samples are similar, but does not have a term to describe said membership. 
 
 Clustering algorithms can also be hierarchical where they have groupings and then subgroupings as well. 
diff --git a/docs/Codomain.md b/docs/Codomain.md
@@ -6,7 +6,7 @@ Khan
 
 **Definition:** The codomain of a function is a set that contains all possible mappings from the domain of inputs to outputs. This set can also contain values that are not mapped to from the domain by the function.
 
-See [[Range.md]] for only the subset of the codomain that is mapped to.
+See [Range](Range.md) for only the subset of the codomain that is mapped to.
 
 Defined formally, we can have any codomain C(f) that fulfills the following where D is the domain of the function f:
 
diff --git a/docs/Complement.md b/docs/Complement.md
@@ -6,4 +6,4 @@ L1
 
 **Definition:** The complement of a set is the set of all elements not in the original set, but in the consideration space (often sample space).
 
-There are technically two types of complements the absolute and relative complements. Generally we are talking about the relative complement which is the set defined as the difference between the superset and the subset. The absolute complement uses the U set ([[UniversalSet.md]]) as the superset. 
+There are technically two types of complements the absolute and relative complements. Generally we are talking about the relative complement which is the set defined as the difference between the superset and the subset. The absolute complement uses the U set ([UniversalSet](UniversalSet.md)) as the superset. 
diff --git a/docs/ConditionalProbabilityTheorem.md b/docs/ConditionalProbabilityTheorem.md
@@ -0,0 +1,11 @@
+# Conditional Probability Threoem 
+
+L2
+
+**Definition:** Conditional probability theroem is $P(A|B) = \frac{P(A \cap B)}{P(B)}$.
+
+This theroem is used to find the probability of some outcome given another piece of information. This is also referred to as [ConditionalProbabilities](ConditionalProbabilities.md)
+
+## Intuition
+
+The probability that A is true given B is the same as the probability of A and B divided by the overall probability of B.
diff --git a/docs/ConditionalProbabilityTheroem.md b/docs/ConditionalProbabilityTheroem.md
@@ -1,13 +0,0 @@
-# Conditional Probability Threoem 
-
-L2
-
-
-
-**Definition:** Conditional probability theroem is $P(A|B) = \frac{P(A \cap B)}{P(B)}$.
-
-This theroem is used to find the probability of some outcome given another piece of information. This is also referred to as [[ConditionalProbabilities.md]].
-
-## Intuition
-
-The probability that A is true given B is the same as the probability of A and B divided by the overall probability of B.
diff --git a/docs/ContinuousProbability.md b/docs/ContinuousProbability.md
@@ -8,4 +8,4 @@ Stats Ch1
 
 This is often defined by intervals either finite or infinite.
 
-To graph continuous probabilities we often use density (kde) graphs to show probability of any given input lasting an amount of time. These are referred to as [[ProbabilityDensityFunctions.md]] of pdfs. While histograms fill a similar role, they are not considered a pdf because they use bins instead of continuity.
+To graph continuous probabilities we often use density (kde) graphs to show probability of any given input lasting an amount of time. These are referred to as [ProbabilityDensityFunctions](ProbabilityDensityFunctions.md) of pdfs. While histograms fill a similar role, they are not considered a pdf because they use bins instead of continuity.
diff --git a/docs/Contrapositive.md b/docs/Contrapositive.md
@@ -6,6 +6,6 @@ Throughout TB - U1.7.2 Discrete TB
 
 **Definition:** To prove an if then statement with contrapositive we assume the then statement is false. Following from here we then prove the if part must also be true for the then to be false. So it follows that if the first is true then the second is also true because the second is never true when the first is false. 
 
-This is of the form $\neg q \to \neg p$ where we switch the statements and negate both. To just negate both we [[Inverse.md]] it.
+This is of the form $\neg q \to \neg p$ where we switch the statements and negate both. To just negate both we [Inverse](Inverse.md) it.
 
 This always has the same truth value as the original.
diff --git a/docs/Correlation.md b/docs/Correlation.md
@@ -6,4 +6,4 @@ Stats D2
 
 **Definition:** Correlation is the strength and direction relationship between two variables. This value is bounded between -1 and 1 where 0 is no correlation, 1 is pure positive linear relationship, and -1 is a pure negative linear relationship.
 
-See [[CorrelationCoefficient.md]] for an applied example.
+See [CorrelationCoefficient](CorrelationCoefficient.md) for an applied example.
diff --git a/docs/CounterExample.md b/docs/CounterExample.md
@@ -4,4 +4,4 @@ Abstract Math Proof Technique
 
 
 
-**Definition:** Counter example proofs are similar to [[DirectProof.md]], but instead of assuming that they are true you assume they are false. From this assumption you then need to show that this is in some way fallacious.  
+**Definition:** Counter example proofs are similar to [DirectProof](DirectProof.md), but instead of assuming that they are true you assume they are false. From this assumption you then need to show that this is in some way fallacious.  
diff --git a/docs/Covariance.md b/docs/Covariance.md
@@ -10,4 +10,4 @@ There are also no bounds for the range of covariance unlike correlation.
 
 Cov(X,X) = Var(X) // keep in mind this is the squared unit like variance.
 
-See [[CorrelationCoefficient.md]] for normalized version of this value.
+See [CorrelationCoefficient](CorrelationCoefficient.md) for normalized version of this value.
diff --git a/docs/CramersRule.md b/docs/CramersRule.md
@@ -4,7 +4,7 @@
 
 
 
-**Definition:** Cramer's rule is an alternative to [[GaussianElimination.md]] for solving systems of equations.
+**Definition:** Cramer's rule is an alternative to [GaussianElimination](GaussianElimination.md) for solving systems of equations.
 
 While slower and generally worse, it is novel.
 
diff --git a/docs/Crosstabulation.md b/docs/Crosstabulation.md
@@ -14,4 +14,4 @@ Admittance  	Male	Female
 		Rejected 1493  1278
 
 
-This data can be shown using a [[MosaicPlot.md]] for graphical viewing with sized boxes.
+This data can be shown using a [MosaicPlot](MosaicPlot.md) for graphical viewing with sized boxes.
diff --git a/docs/DRAM.md b/docs/DRAM.md
@@ -1,12 +1,12 @@
 # DRAM
 
-DRAM is what we think of as RAM. See [[Memory.md]] for other links.
+DRAM is what we think of as RAM. See [Memory](Memory.md) for other links.
 
 
 
 
-[[DRAMBanks.md]] are a 2d matrix of [[DRAMCell.md]] and it is accessed by rows. When the processor wants a row, it activates the row, sends it to the [[RowBuffer.md]], and then sends the data out. Subsequent accesses of a different column are very fast because the row is already in a buffer. This can be thought of cached rows.
+[DRAMBanks](DRAMBanks.md) are a 2d matrix of [[DRAMCell.md]] and it is accessed by rows. When the processor wants a row, it activates the row, sends it to the [[RowBuffer.md]], and then sends the data out. Subsequent accesses of a different column are very fast because the row is already in a buffer. This can be thought of cached rows.
 
 One optimization done is to prioritize memory requests associated with memory that is already buffered to decrease context switching. This causes issues with multiple applications because it will prioritize applications that use localized memory more often. You can also create programs that take advantage of this to deny memory from other applications. On the flip side, if you are simply using an oldest request scheduling algorithm then random access requests will take more time and thus if one application uses more of them it will get more time than the other application. 
 
-[[DRAMChips.md]] are the larger DRAM unit that includes both the Banks and associated circuitry.  
+[DRAMChips](DRAMChips.md) are the larger DRAM unit that includes both the Banks and associated circuitry.  
diff --git a/docs/DRAMBanks.md b/docs/DRAMBanks.md
@@ -2,4 +2,4 @@
 
 
 
-**Definition:** 2d bank of [[DRAMCell.md]] that is accessed by a row at a time rows may be around 8kb in size.  
+**Definition:** 2d bank of [DRAMCell](DRAMCell.md) that is accessed by a row at a time rows may be around 8kb in size.  
diff --git a/docs/DRAMCell.md b/docs/DRAMCell.md
@@ -1,7 +1,5 @@
 # DRAM Cell
 
-
-
 A DRAM Cell is the cell used to store one bit of information. It is made of a capacitor and an access transistor. The data is stored in the charge of the capacitor. 
 
-The access transistor is how you are able to query them. Since the access transistor is not perfect nor is the transistor they leak energy over time. As such they need to be refreshed over time using [[DRAMRefresh.md]].   
+The access transistor is how you are able to query them. Since the access transistor is not perfect nor is the transistor they leak energy over time. As such they need to be refreshed over time using [DRAMRefresh](DRAMRefresh.md).
diff --git a/docs/DRAMChips.md b/docs/DRAMChips.md
@@ -2,4 +2,4 @@
 
 
 
-DRAM Chips are the chips that contain the [[DRAMBanks.md]] along with associated circuitry. There are many chips (I think normally 8) that make up a RAM module. 
+DRAM Chips are the chips that contain the [DRAMBanks](DRAMBanks.md) along with associated circuitry. There are many chips (I think normally 8) that make up a RAM module. 
diff --git a/docs/DRAMRefresh.md b/docs/DRAMRefresh.md
@@ -2,10 +2,10 @@
 
 
 
-This is the process of refreshing the energy stored in a [[DRAMCell.md]]'s capacitor so that losses in energy over time do not cause loss of data (bitrot). 
+This is the process of refreshing the energy stored in a [DRAMCell](DRAMCell.md)'s capacitor so that losses in energy over time do not cause loss of data (bitrot). 
 
 Currently, as of 2015, refreshes are required every 64ms. This costs electricity, can cause blocking issues, and as there is scaling these computations become slower and more power consuming. As an example, with 64gb DRAM refreshes can take up to 46% of time while 4gb is about 8%
 
 There is little coordination between the OS and the memory controller and the memory controller can't store information about what memory is allocated. This means that instead of just refreshing allocated memory, all memory is refreshed at the given frequency.
 
-This process is ran every 64ms and while most [[DRAMCell.md]]'s can go much longer than 64ms the LCD is 64ms. This is a bar for manufacturing that causes bad RAM to be thrown away. The memory controller could probably be smarter about this, but this is not done. These thoughts about how to optimize these things are RAIDR which is Retention aware intelligent DRAM refresh particularly thing thinking about [[BloomFilter.md]] usage to track which cells need more frequent refreshes. This can reduce refreshes by 74.6% at the cost of 1.25kb of memory with 8gb chips. 
+This process is ran every 64ms and while most [DRAMCell](DRAMCell.md)'s can go much longer than 64ms the LCD is 64ms. This is a bar for manufacturing that causes bad RAM to be thrown away. The memory controller could probably be smarter about this, but this is not done. These thoughts about how to optimize these things are RAIDR which is Retention aware intelligent DRAM refresh particularly thing thinking about [[BloomFilter.md]] usage to track which cells need more frequent refreshes. This can reduce refreshes by 74.6% at the cost of 1.25kb of memory with 8gb chips. 
diff --git a/docs/DRAMRowHammer.md b/docs/DRAMRowHammer.md
@@ -3,4 +3,4 @@ Computer Architecture L1
 
 
 
-See [[DisturbanceErrors.md]] for more information as it describes this vulnerability. 
+See [DisturbanceErrors](DisturbanceErrors.md) for more information as it describes this vulnerability. 
diff --git a/docs/DataFlow.md b/docs/DataFlow.md
@@ -1,14 +1,13 @@
+# Data Flow
 
 Computer Architecture L2
 
-
-
 **Definition:** This is a theory of computation that stipulates execution of code should be on a dependence basis instead of in order. If one instruction is dependent upon another that has not been executed it should not be executed, but if all dependencies have been executed then the code can be executed, if chosen to.
 
-This model is in contrast with [[VonNeumannModel.md]] where everything is sequential. 
+This model is in contrast with [VonNeumannModel](VonNeumannModel.md) where everything is sequential. 
 
 Data flow can be easily visualized as a graph.
 
 This paradigm requires a differently designed processor. 
 
-This paradigm is also, sort of, implemented via [[OutOfOrderExecution.md]]
+This paradigm is also, sort of, implemented via [OutOfOrderExecution](OutOfOrderExecution.md)
diff --git a/docs/Degree.md b/docs/Degree.md
@@ -2,6 +2,4 @@
 
 CG W13 L2
 
-
-
-**Definition:** Degree is a term used to describe the number of edges meeting a [[Vertex.md]].
+**Definition:** Degree is a term used to describe the number of edges meeting a [Vertex](Vertex.md).
diff --git a/docs/DemorgansLaw.md b/docs/DemorgansLaw.md
@@ -40,4 +40,4 @@ This is basically the distributive property of boolean logic whereby we flip the
 
 #### For Quantifiers
 
-See [[Quantifiers.md]] section on negation which describes the distribution of a negation when quantifiers are involved.
+See [Quantifiers](Quantifiers.md) section on negation which describes the distribution of a negation when quantifiers are involved.
diff --git a/docs/DepthFirstSearch.md b/docs/DepthFirstSearch.md
@@ -2,10 +2,8 @@
 
 CS202 L14
 
-
-
 **Definition:** Searching algorithm that traverses until reaching a leaf node then going back by one and doing the same on the other subtree.
 
-This normally uses the call [[Stack.md]] to search.
+This normally uses the call [Stack](Stack.md) to search.
 
-Also see [[BreadthFirstSearch.md]]
+Also see [BreadthFirstSearch](BreadthFirstSearch.md)
diff --git a/docs/Determinant.md b/docs/Determinant.md
@@ -4,7 +4,7 @@ CS331 - Linear Algebra - Khan U2
 
 
 
-**Definition:** The determinant is the scaling factor of some area (or volume in 3d space) from before to after a linear transformation. Note that this is only useful in 3d and 2d as the notion of volume in higher dimensions ([[Hypervolume.md]]) is a bit abstract.
+**Definition:** The determinant is the scaling factor of some area (or volume in 3d space) from before to after a linear transformation. Note that this is only useful in 3d and 2d as the notion of volume in higher dimensions ([Hypervolume](Hypervolume.md)) is a bit abstract.
 
 This value can be negative if the space has been flipped. In 3d space, this means the volume after the tranformation is in left hand space if it was before in right hand space.
 
diff --git a/docs/DiscreteUniformLaw.md b/docs/DiscreteUniformLaw.md
@@ -4,4 +4,4 @@ L1
 
 
 
-**Definition:** The discrete uniform law states that if all outcomes in a [[SampleSpace.md]] are equally probable then P(A) where A is a set is the same as |A| / |Omega| where Omega is the entire sample space.
+**Definition:** The discrete uniform law states that if all outcomes in a [SampleSpace](SampleSpace.md) are equally probable then P(A) where A is a set is the same as |A| / |Omega| where Omega is the entire sample space.
diff --git a/docs/DisturbanceErrors.md b/docs/DisturbanceErrors.md
@@ -1,7 +1,5 @@
 # Disturbance Errors
 
+Also referred to as [DRAMRowHammer](DRAMRowHammer.md).
 
-
-Also referred to as [[DRAMRowHammer.md]]
-
-These are caused by frequent accesses of a given row. When a row is moved to the [[RowBuffer.md]] there is a (precharge) high charge applied to it and a low charge applied to the one being moved out of the buffer. This activation over and over to the same row can cause errors in adjacent rows because of how close together dram rows are. This increases the rate of charge leakage in adjacent rows. This issue has been resolved in flash by a controller that stores error correcting codes and checks over and over. There are still issues with this memory, but ecc resolves this issue when needed just more expensive. 
+These are caused by frequent accesses of a given row. When a row is moved to the [RowBuffer](RowBuffer.md) there is a (precharge) high charge applied to it and a low charge applied to the one being moved out of the buffer. This activation over and over to the same row can cause errors in adjacent rows because of how close together dram rows are. This increases the rate of charge leakage in adjacent rows. This issue has been resolved in flash by a controller that stores error correcting codes and checks over and over. There are still issues with this memory, but ecc resolves this issue when needed just more expensive. 
diff --git a/docs/ExplodingGradients.md b/docs/ExplodingGradients.md
@@ -6,7 +6,7 @@ ML 550
 
 **Definition:** Exploding gradients is a problem with training neural networks where lower levels have very high gradients and thus the gradient steps diverge from a proper solution.
 
-This is the opposite of [[VanishingGradients.md]]
+This is the opposite of [VanishingGradients](VanishingGradients.md)
 
 This often occurs for recurrent neural networks. 
 
@@ -14,4 +14,4 @@ This often occurs for recurrent neural networks.
 
 Use ReLU and better weight initialization (not gaussian distribution with std deviation of 1).
 
-See [[UnstableGradients.md]] for more.
+See [UnstableGradients](UnstableGradients.md) for more.
diff --git a/docs/ExponentialDistribution.md b/docs/ExponentialDistribution.md
@@ -2,8 +2,6 @@
 
 Stats D1
 
-
-
 **Definition:** An exponential distribution is one that is decreasing at a decreasing pace. Specifically, it can be stated in some form of lambda^-x where there may be constants or other things involved, but we find that as x increases, y decreases at a decreasing rate. 
 
-This is often used to show the probability of time between random things happening which is similar in some ways to [[PoissonDistribution.md]].
+This is often used to show the probability of time between random things happening which is similar in some ways to [PoissonDistribution](PoissonDistribution.md).
diff --git a/docs/FeatureScaling.md b/docs/FeatureScaling.md
@@ -8,4 +8,4 @@ ML CH2
 
 Feature scaling is important because machine learning algorithms don't do well when you have lots of vectors that use vastly different scales of values.
 
-There are two types of feature scaling namely [[MinMaxScaling.md]] and [[Standardization.md]]
+There are two types of feature scaling namely [MinMaxScaling](MinMaxScaling.md) and [[Standardization.md]]
diff --git a/docs/ForwardThoughts.md b/docs/ForwardThoughts.md
@@ -2,8 +2,6 @@
 
 Things that could be possible and necessary for future development
 
-
-
 There will need to be architecture capable of allowing higher levels of computation. We need to consider future scaling. 
 
 In the future architecture may need to enable:
@@ -12,12 +10,12 @@ In the future architecture may need to enable:
 2. Virtual Reality
 3. Personalized Genomics/Medicine
 
-Solutions to [[DRAMRowHammer.md]].
+Solutions to [DRAMRowHammer](DRAMRowHammer.md)
 
 Alternatives to Von Neumann Model:
 
 1. Multi Processors
 	- Each processor is Von Neumann, but there is parallel processing which is not
-2. [[DataFlow.md]]
-3. [[BulkSynchronousProcessing.md]]
+2. [DataFlow](DataFlow.md)
+3. [BulkSynchronousProcessing](BulkSynchronousProcessing.md)
 	- This is the common way of doing synchronous processing while using Von Neumann constrained processors
diff --git a/docs/Frequency.md b/docs/Frequency.md
@@ -6,4 +6,4 @@ Ch 1.1
 
 **Definition:** Frequency describes the number of occurences of a given outcome from the trials of a random experiment.
 
-Frequency is often confused with [[RelativeFrequency.md]] and [[Probability.md]] but they are different terms as the others desribe relative likelihood of an event.
+Frequency is often confused with [RelativeFrequency](RelativeFrequency.md) and [[Probability.md]] but they are different terms as the others desribe relative likelihood of an event.
diff --git a/docs/FundamentalTheoremOfArithmetic.md b/docs/FundamentalTheoremOfArithmetic.md
@@ -1,8 +1,6 @@
 # The Fundamental Theorem of Arithmetic
 
-Abstract Math 10.4. Can be proven through [[StrongInduction.md]]
-
-
+Abstract Math 10.4. Can be proven through [StrongInduction](StrongInduction.md)
 
 **Definition:** Any integer greater than 1 has a unique prime factorization. 
 
diff --git a/docs/GameLoop.md b/docs/GameLoop.md
@@ -10,4 +10,4 @@ This is the same idea as animation which is giving motion to still images.
 
 When using the game loop you can call Time.deltaTime to get the seconds between the previous and current frame as a float. This can be used to achieve frame rate independence. 
 
-See [[MonoBehaviour.md]] for more about update method.
+See [MonoBehaviour](MonoBehaviour.md) for more about update method.
diff --git a/docs/GameObject.md b/docs/GameObject.md
@@ -6,7 +6,7 @@ CS 331 W12 L3
 
 **Definition:** This is the data type of objects in the game. This is a broad class that has some built in functionallity. 
 
-A common way to move an object forward using it's [[Vector3.md]] is as follows:
+A common way to move an object forward using it's [Vector3](Vector3.md) is as follows:
 ```csharp
 	float speed = 2; //default forward speed
 	bool moveForward = Input.GetKey("up");
diff --git a/docs/GaussianElimination.md b/docs/GaussianElimination.md
@@ -4,8 +4,8 @@ Khan U1
 
 
 
-**Definition:** Gaussian elimination is the process of simplifying a system of equations to [[ReducedRowEchelonForm.md]] to solve the system.
+**Definition:** Gaussian elimination is the process of simplifying a system of equations to [ReducedRowEchelonForm](ReducedRowEchelonForm.md) to solve the system.
 
 Basically, we perform row operations on an augmented matrix to get RREF. We then find the values of the x, y, and z components and that is our solution.
 
-See also [[CramersRule.md]]
+See also [CramersRule](CramersRule.md)
diff --git a/docs/GradientClipping.md b/docs/GradientClipping.md
@@ -6,7 +6,7 @@ ML P569
 
 **Definition:** Gradient clipping is the process of clipping gradients during backpropogration so they never exceed some threshold.
 
-This is another technique used to resolve issues relating to [[ExplodingGradients.md]] particularly for RNNs where batch normalization does not work.
+This is another technique used to resolve issues relating to [ExplodingGradients](ExplodingGradients.md) particularly for RNNs where batch normalization does not work.
 
 There are two ways to do gradient clipping either with a threshold cut off or with vector scaling. With vector scaling we retain the direction of the vector and set the minimize the largest value to 1 (if greater than 1) while scaling all other features proprotionally. More commonly, we simply truncate values so if we have [100, .1] with a threshold of (-1,1) we would then scale the vector to [1, .1].
 
diff --git a/docs/GradientDescent.md b/docs/GradientDescent.md
@@ -4,15 +4,15 @@ ML L2
 
 
 
-**Definition:** Gradient Descent is an algorithm used to find a 'near' optimal approach to the given problem. This is used with [[LinearRegression.md]] to optimize the function by selecting a set of parameters $\theta$ and then repeatedly finding the direction that results in the fastest movement towards a cost function's value nearest to 0. This will find a local optimum. With linear regression however there will not be local optimum but only global.
+**Definition:** Gradient Descent is an algorithm used to find a 'near' optimal approach to the given problem. This is used with [LinearRegression](LinearRegression.md) to optimize the function by selecting a set of parameters $\theta$ and then repeatedly finding the direction that results in the fastest movement towards a cost function's value nearest to 0. This will find a local optimum. With linear regression however there will not be local optimum but only global.
 
-General idea is to start with some $\theta$ (parameters) and keep changing it to reduce J($\theta$). (Find J in [[LinearRegression.md]])
+General idea is to start with some $\theta$ (parameters) and keep changing it to reduce J($\theta$). (Find J in [LinearRegression](LinearRegression.md))
 
 More specifically, you pick a starting point, see what direction you should go to get closer to 0 the fastest. You then repeat this algorithm. It's not perfect, but it's fast.
 
-This is a common algorithm used for [[LinearRegression.md]] when there are lots of features or lots of samples (too big for memory) which would cause the formula for linear regression to be too slow.
+This is a common algorithm used for [LinearRegression](LinearRegression.md) when there are lots of features or lots of samples (too big for memory) which would cause the formula for linear regression to be too slow.
 
-For a simple implementation of gradient descent using a [[LearningRate.md]] for third degree polynomials see [[GradientDescentCode.md]].
+For a simple implementation of gradient descent using a [LearningRate](LearningRate.md) for third degree polynomials see [[GradientDescentCode.md]].
 
 When using gradient descent for linear regression one must calculate the partial derivative for each variable and then determine if it is positive or negative and move in the correct direction. 
 
diff --git a/docs/GradientDescentCode.md b/docs/GradientDescentCode.md
@@ -1,6 +1,6 @@
 # Gradient Descent Implementation 
 
-This approach implements a [[LearningRate.md]] parameter to narrow in upon a local minimum of the given third degree polynomial.
+This approach implements a [LearningRate](LearningRate.md) parameter to narrow in upon a local minimum of the given third degree polynomial.
 
 ## Code
 ```python
diff --git a/docs/Graphs.md b/docs/Graphs.md
@@ -2,10 +2,8 @@
 
 Abstract Math 10.2. 
 
-
-
 **Definition:** A graph is a configuration consisting of vertices and edges. 
 
 **Cycle:** A cycle in a graph is a set of vertices such that traversal can be done back to itself.
 
-Also see [[Tree.md]]
+Also see [Tree](Tree.md).
diff --git a/docs/HarmonicMean.md b/docs/HarmonicMean.md
@@ -6,7 +6,7 @@ ML D2
 
 **Definition:** The harmonic mean is a metric used to describe the accuracy of a model. This value is representative of the precision and recall of a model.
 
-Basically, this is a combination of [[Precision.md]] and recall
+Basically, this is a combination of [Precision](Precision.md) and recall
 
 The harmonic mean favors models with similarly good values for both recall and precision which can be good in certain cases. There are however many cases where precision, recall, or accuracy may be more important.
 
@@ -16,4 +16,4 @@ Formula:
 
 F_1 = 2 * (p * r) / (p+r)
 
-Where p = [[Precision.md]] and r = recall
+Where p = [Precision](Precision.md) and r = recall
diff --git a/docs/Homogeneous.md b/docs/Homogeneous.md
@@ -2,11 +2,9 @@
 
 Khan U2
 
-
-
 **Definition:** In linear algebra a homogeneous solution is one where the right side of the system is the zero vector. 
 
-See also [[Inhomogeneous.md]]
+See also [Inhomogeneous](Inhomogeneous.md)
 
 ## CS
 
diff --git a/docs/Hyperparameter.md b/docs/Hyperparameter.md
@@ -6,4 +6,4 @@ ML CH2
 
 **Definition:** A hyperparameter in ML is a parameter that is defined prior to training that is not influenced by samples.
 
-Examples of hyperparmeters are [[LearningRate.md]] and m in the case of calculating weighted means. More about this can be seen here [[TargetEncoding.md]]
+Examples of hyperparmeters are [LearningRate](LearningRate.md) and m in the case of calculating weighted means. More about this can be seen here [[TargetEncoding.md]]
diff --git a/docs/Hyperplane.md b/docs/Hyperplane.md
@@ -2,6 +2,4 @@
 
 Khan U2
 
-
-
-**Definition:** A hyperplane is a 3-dimensional or higher subspace with dimensionality that is one less than the [[AmbientSpace.md]].
+**Definition:** A hyperplane is a 3-dimensional or higher subspace with dimensionality that is one less than the [AmbientSpace](AmbientSpace.md)
diff --git a/docs/Hypervolume.md b/docs/Hypervolume.md
@@ -4,4 +4,4 @@ Khan U2
 
 
 
-**Definition:** Hypervolume much like [[Hyperplane.md]] is volume in dimensions higher than 3.
+**Definition:** Hypervolume much like [Hyperplane](Hyperplane.md) is volume in dimensions higher than 3.
diff --git a/docs/ISA.md b/docs/ISA.md
@@ -6,17 +6,17 @@ Computer Architecture L(2,3)
 
 **Definition:** The design of the interconnection between hardware and software to create a functional computing system. 
 
-This is the agreed upon interface between os/vm/higher level things and lower level [[MicroArchitecture.md]]. This information is necessary to know for the OS developer.
+This is the agreed upon interface between os/vm/higher level things and lower level [MicroArchitecture](MicroArchitecture.md) This information is necessary to know for the OS developer.
 
 The ISA also includes register things and sometimes the CPU frequency/voltage.
 
-[[Pipelining.md]] is generally not part of the ISA on newer systems.
+[Pipelining](Pipelining.md) is generally not part of the ISA on newer systems.
 
 Some ISAs have additional room for un-implemented instructions that would allow for future expansion. 
 
 ---
 
-0-address machines (stack machines) are machines that only take [[Opcode.md]] but not [[Operands.md]]. 0-address takes up less space in code, everything is already on the stack, but it can be very slow and can't express all computations easily (consider order of operations).  
+0-address machines (stack machines) are machines that only take [Opcode](Opcode.md) but not [[Operands.md]]. 0-address takes up less space in code, everything is already on the stack, but it can be very slow and can't express all computations easily (consider order of operations).  
 
 2-address machines are source + destination for operands. This does not preserve the value of the destination which requires copying overhead. x86 is 2-address.
 
@@ -26,7 +26,7 @@ Some ISAs have additional room for un-implemented instructions that would allow 
 
 The ISA also defines the supported datatypes. Some common ones include int, float, character. Sometimes they can include linked lists, stacks, queues, and strings. 
 
-With more/high level datatypes in the ISA we have smaller code, more cpu complexity, but simpler compilers. This basically means harder for [[MicroArchitecture.md]] development, but easier for compiler developer. 
+With more/high level datatypes in the ISA we have smaller code, more cpu complexity, but simpler compilers. This basically means harder for [MicroArchitecture](MicroArchitecture.md) development, but easier for compiler developer. 
 
 This ties into semantic gap which describes the difference between the ISA and what programmers are trying to do with respect to datatypes and opcodes. When there are more datatypes, the semantic gap is low. The inverse is also true.
 
diff --git a/docs/Image.md b/docs/Image.md
@@ -6,7 +6,7 @@ Khan U2
 
 **Definition:** The image of a function is the total set of all outputs of a given function (transformation for vectors).
 
-This is the same as [[Range.md]].
+This is the same as [Range](Range.md)
 
 Subsequently the preimage is the domain of the function with mappings to elements of the image.
 
diff --git a/docs/Incremental.md b/docs/Incremental.md
@@ -2,8 +2,6 @@
 
 CLRS 2.3
 
-
-
 **Definition:** Incremental algorithms are algorithms that solve the task in order (iteratively).
 
-An example of this is [[InsertionSort.md]].
+An example of this is [InsertionSort](InsertionSort.md)
diff --git a/docs/Induction.md b/docs/Induction.md
@@ -28,10 +28,10 @@ When using induction the common form is $S_k \implies S_{k+1}$, but it is equall
 
 When proving induction it is important to first state what the value of k+1 equates to. We then need to go from there to equate it to the other side of the statement. We should not assign the left and right together from the start because there would be nowhere to go from there instead do algebra to prove the statement is true.
 
-[[StrongInduction.md]] Is another type of induction. 
+[StrongInduction](StrongInduction.md) Is another type of induction. 
 
-See also [[SmallestCounterExample.md]] for something similar to [[CounterExample.md]] of [[Induction.md]]
+See also [SmallestCounterExample](SmallestCounterExample.md) for something similar to [[CounterExample.md]] of [[Induction.md]]
 
-It is important to note that a set must be [[WellOrdered.md]] for it to be possible to prove by induction. 
+It is important to note that a set must be [WellOrdered](WellOrdered.md) for it to be possible to prove by induction. 
 
-Another interesting thing that relates to induction is [[FibonacciNumbers.md]] in the sense that they are entirely reliant upon previous calculations to determine the next value in the set. 
+Another interesting thing that relates to induction is [FibonacciNumbers](FibonacciNumbers.md) in the sense that they are entirely reliant upon previous calculations to determine the next value in the set. 
diff --git a/docs/Inertia.md b/docs/Inertia.md
@@ -6,4 +6,4 @@ ML D5
 
 **Definition:** Inertia in machine learning is the sum of the squared distances from instances to their closest centroid. 
 
-This is often used as a gauge for the accuracy of a [[KMeans.md]] model.
+This is often used as a gauge for the accuracy of a [KMeans](KMeans.md) model.
diff --git a/docs/Inference.md b/docs/Inference.md
@@ -6,4 +6,4 @@ Ch2
 
 **Definition:** Inference is the statistical process of finding relationships between data.
 
-This is not to be confused with [[Prediction.md]] which is the process of guessing an output.
+This is not to be confused with [Prediction](Prediction.md) which is the process of guessing an output.
diff --git a/docs/Inhomogeneous.md b/docs/Inhomogeneous.md
@@ -2,8 +2,6 @@
 
 Khan U2
 
-
-
 **Definition:** An inhomogeneous solution in linear algebra is a solution where the right side of the system of equations is not the zero vector.
 
-See also [[Homogeneous.md]]
+See also [Homogeneous](Homogeneous.md)
diff --git a/docs/Instruction.md b/docs/Instruction.md
@@ -6,7 +6,7 @@ CA L3
 
 **Definition:** An instruction is the most basic element of the hardware software interface which describes what to do and to who. 
 
-An instruction is made of two parts, the [[Opcode.md]] describes what to do, and the [[Operands.md]] describe to who. 
+An instruction is made of two parts, the [Opcode](Opcode.md) describes what to do, and the [[Operands.md]] describe to who. 
 
 There are also classes of instructions. These are the following 3:
 
@@ -18,4 +18,4 @@ There are also classes of instructions. These are the following 3:
     - Change sequence of instructions to execute
 
 
-See [[ISA.md]] for more about instruction sets. 
+See [ISA](ISA.md) for more about instruction sets. 
diff --git a/docs/InverseTransformation.md b/docs/InverseTransformation.md
@@ -6,7 +6,7 @@ Khan U2
 
 **Definition:** The inverse of a transformation is the transformation that undoes the original transformation for the entire domain codomain of the original transformation.
 
-This transformation must be [[Bijective.md]] otherwise there will be issues with mappings either there are outputs without inputs or there are outputs with multiple inputs.
+This transformation must be [Bijective](Bijective.md) otherwise there will be issues with mappings either there are outputs without inputs or there are outputs with multiple inputs.
 
 A transformation is invertible if and only if there exists an f^-1 such that f^-1 composed with f is I (identity function).
 
@@ -24,7 +24,7 @@ Let's assume they are not. We then find A(x) = B(x) for all x in R^n. This means
 
 To find this we know it needs to be bijective. When solving for RREF if there are instances in the R^m space, where R^m is the codomain, that are not mapped to (found by having a row of zeroes where we can't map to everything based on the combination) then the standard matrix is not invertible as it stands from R^n to R^m.
 
-As such, T is onto iff C(A) = R^m (columns span R^m). **We know this is only true when RREF has a pivot in each row. ([[Rank.md]] of the matrix = m)**
+As such, T is onto iff C(A) = R^m (columns span R^m). **We know this is only true when RREF has a pivot in each row. ([Rank](Rank.md) of the matrix = m)**
 
 For injectivity we test for one-to-one. To find this we need to make sure the rank of the matrix is equal to n where n is the number of columns.
 
diff --git a/docs/KMeans.md b/docs/KMeans.md
@@ -2,8 +2,6 @@
 
 ML CH2
 
-
-
 **Definition:** K-means clustering is a clustering algorithm that clusters data together by finding the mean distance from clusteroids and places said element into said cluster.
 
 Basic idea:
@@ -15,4 +13,4 @@ Basic idea:
 
 When using kmeans clustering it can, at times, find local optimum instead of global optimum. To help with this issue one thing that can be done is passing in a list of starting positions for centroids. 
 
-Another solution is to run the algorithm multiple times with different random starting positions. We then take the best solution which minimizes [[Inertia.md]].
+Another solution is to run the algorithm multiple times with different random starting positions. We then take the best solution which minimizes [Inertia](Inertia.md).
diff --git a/docs/Kernel.md b/docs/Kernel.md
@@ -8,4 +8,4 @@ Khan
 
 This is stated as ker(T), spoken as the kernel of T.
 
-This is similar to the [[NullSpace.md]] except it is specific to linear transformations.
+This is similar to the [NullSpace](NullSpace.md) except it is specific to linear transformations.
diff --git a/docs/KeyframeAnimation.md b/docs/KeyframeAnimation.md
@@ -22,7 +22,7 @@ Animators used to draw just a few "key" frames hence the term. The goals is to h
 
 Now, this is a bit different because of interpolation which makes it so we simply need to ensure correct motion between the keyframes instead of ensuring each position at each time. 
 
-See [[Animation.md]] which is related to this topic.
+See [Animation](Animation.md) which is related to this topic.
 
 ### Creation
 
diff --git a/docs/LabelEncoding.md b/docs/LabelEncoding.md
@@ -2,12 +2,10 @@
 
 ML CH2
 
-
-
 **Definition:** Label encoding is the process of encoding some arbitrary label as an arbitrary number. 
 
-This is often done when you have a string input to a neural network or linear regression model and there are too many options for the given feature to do [[OneHotEncoding.md]]. 
+This is often done when you have a string input to a neural network or linear regression model and there are too many options for the given feature to do [OneHotEncoding](OneHotEncoding.md).
 
 One issue with this is that the labels are arbitrary so if the model tries to use these numbers to predict higher being better or worse there will be issues. 
 
-See also [[TargetEncoding.md]] for another way to encode strings as numbers.
+See also [TargetEncoding](TargetEncoding.md) for another way to encode strings as numbers.
diff --git a/docs/LawOfLargeNumbers.md b/docs/LawOfLargeNumbers.md
@@ -2,8 +2,6 @@
 
 L19
 
-
-
 **Definition:** The average results from a large set of independent trials converges upon the true value.
 
-See also [[RegressionToTheMean.md]]
+See also [RegressionToTheMean](RegressionToTheMean.md)
diff --git a/docs/LearningRate.md b/docs/LearningRate.md
@@ -7,10 +7,10 @@ ML L2
 **Definition:** The learning rate is a constant used to narrow in upon some value based on it's distance from an expected value. The further away from the value, the larger the change for a parameter(s) will be.
 
 
-See [[GradientDescentCode.md]] and [[GradientDescent.md]] for an example of when a learning rate would be used and an implementation of it.
+See [GradientDescentCode](GradientDescentCode.md) and [[GradientDescent.md]] for an example of when a learning rate would be used and an implementation of it.
 
 Additionally, learning rate in a higher level sense, with regard to online learning, is how quickly a model will adapt to new data.
 
 These constants that affect learning rate are called "hyperparameters" which are defined as constants prior to model training that are not built into the model.
 
-Another term is also the learning schedule. This is the rate at which the learning rate changes. In the case of [[GradientDescent.md]] this would be the amount it decreases over time as you narrow in on an optima.
+Another term is also the learning schedule. This is the rate at which the learning rate changes. In the case of [GradientDescent](GradientDescent.md) this would be the amount it decreases over time as you narrow in on an optima.
diff --git a/docs/LinearIndependence.md b/docs/LinearIndependence.md
@@ -28,7 +28,7 @@ If c_1*a + c_2*b = 0 is true for some constants c_1 and c_2 then we have depende
 
 ### Intuitive Definition
 
-Linear independence means each vector in a set of vectors (possibly matrix) adds something to the matrix such that the [[Span.md]] of the set of vectors is larger.
+Linear independence means each vector in a set of vectors (possibly matrix) adds something to the matrix such that the [Span](Span.md) of the set of vectors is larger.
 
 
 ### Solving
diff --git a/docs/LinearRegression.md b/docs/LinearRegression.md
@@ -25,8 +25,8 @@ Theta = (X transpose * X) ^ -1 * X transpose * y
 
 Where y is an m x 1 vector of target values and X is in some way related to inputs as a matrix with a column of ones for the intercept term... 
 
-This way of linear regression, the closed form way, is better when there are not a massive number of features, but if there are lots of features or the training instances aer too vast to fit into memory, then the [[GradientDescent.md]] way is better.
+This way of linear regression, the closed form way, is better when there are not a massive number of features, but if there are lots of features or the training instances aer too vast to fit into memory, then the [GradientDescent](GradientDescent.md) way is better.
 
-See [[RidgeRegression.md]], [[LassoRegression.md]], and [[ElasticNetRegression.md]] for some ways to constrain linear models (decrease degrees of freedom to avoid overfitting).
+See [RidgeRegression](RidgeRegression.md), [[LassoRegression.md]], and [[ElasticNetRegression.md]] for some ways to constrain linear models (decrease degrees of freedom to avoid overfitting).
 
 As it relates to linear regression, it is good to add some regularization and when we know only a few features matter elastic regression is good. Otherwise, in most cases, ridge regression is a good option when we don't think there are useless features.
diff --git a/docs/LinkedLists.md b/docs/LinkedLists.md
@@ -2,8 +2,6 @@
 
 This is from CS 221 W11 Lecture 13. 
 
-
-
 **Definition:** A linked list is a list of items that are linked together using pointers. As such they are not in contiguous memory. 
 
 Inserting into and removing from linked lists is faster than arrays when resizing / defragmenting are at play. 
@@ -12,10 +10,10 @@ Inserting into and removing from linked lists is faster than arrays when resizin
 
 Acyclic Linked Lists:
 
-[[SinglyLinkedList.md]]
-[[DoublyLinkedList.md]]
+- [SinglyLinkedList](SinglyLinkedList.md)
+- [DoublyLinkedList](DoublyLinkedList.md)
 
 Cyclic Linked Lists:
 
-[[CircularLinkedList.md]]
-[[CircularDoublyLinkedList.md]]
+- [CircularLinkedList](CircularLinkedList.md)
+- [CircularDoublyLinkedList](CircularDoublyLinkedList.md)
diff --git a/docs/LocalScale.md b/docs/LocalScale.md
@@ -6,4 +6,4 @@ CS331 W12 L2
 
 Member of transform class that can be assigned. This affects the local scale of the GameObject.
 
-See [[Rotate.md]] for rotating based on local rotation and [[Translate.md]] for moving based on local coordinates. 
+See [Rotate](Rotate.md) for rotating based on local rotation and [[Translate.md]] for moving based on local coordinates. 
diff --git a/docs/LogisticRegression.md b/docs/LogisticRegression.md
@@ -14,4 +14,4 @@ An interesting thing about logistic regression is that the log loss function doe
 
 With the sigmoid function we define the decision boundary as the x-value for which greater values are true and lesser values are false. This position is at the 50% probability mark.
 
-See [[SoftmaxRegression.md]] for an extrapolation of linear regression for multi-class classification without combining binary classifiers.
+See [SoftmaxRegression](SoftmaxRegression.md) for an extrapolation of linear regression for multi-class classification without combining binary classifiers.
diff --git a/docs/LoopInvariant.md b/docs/LoopInvariant.md
@@ -6,7 +6,7 @@ CLRS 2.1
 
 **Definition:** A loop invariant is a condition that is true before and after a loop is ran.
 
-In the case of insertion sort the loop invariant is that [0 : p] is sorted where p is the number of prior iterations (prior elements sorted). See [[InsertionSort.md]] to understand this better.
+In the case of insertion sort the loop invariant is that [0 : p] is sorted where p is the number of prior iterations (prior elements sorted). See [InsertionSort](InsertionSort.md) to understand this better.
 
 Given that this must be true before and after running, we know it must be initialized as true which can sometimes mean manual running to get it started outside the loop itself to ensure proper iteration.
 
diff --git a/docs/MAE.md b/docs/MAE.md
@@ -6,7 +6,7 @@ ML CH2
 
 **Definition:** MAE also known as average absolute deviation or mean absolute error is an error metric used to describe the accuracy of a model by taking the difference between the inference and actual values of a set of samples and averaging the value.
 
-This is sometimes used when there are many outliers which can largely effect the [[RMSE.md]] error metric because of the way it weights deviations.
+This is sometimes used when there are many outliers which can largely effect the [RMSE](RMSE.md) error metric because of the way it weights deviations.
 
 Implementation:
 
diff --git a/docs/MarkovChains.md b/docs/MarkovChains.md
@@ -10,7 +10,7 @@ Given that the state needs to have all relevant information, we need to choose o
 
 Anything that evolves with time can be described as a markov chain.
 
-These types of processes are not memoryless like [[BernoulliProcess.md]] or [[PoissonDistribution.md]].
+These types of processes are not memoryless like [BernoulliProcess](BernoulliProcess.md) or [[PoissonDistribution.md]].
 
 #### Markov Assumption
 
@@ -57,7 +57,7 @@ Alternatively, we can use recursive approach to find the probability of each tra
 
 The steady state of a markov chain is the constant probability of some given state after an arbitrarily long period of time. This can be thought of the limit as n approaches infinity. If there is not convergence then there is not a steady state.
 
-In most cases we will reach a steady state but this might not happen in cases of [[PeriodicChain.md]] or irreducability where not all states there are two seperate recurrent loops. The seperate recurrent loops cause a non-steady state because steady states need to be initial condition agnostic.
+In most cases we will reach a steady state but this might not happen in cases of [PeriodicChain](PeriodicChain.md) or irreducability where not all states there are two seperate recurrent loops. The seperate recurrent loops cause a non-steady state because steady states need to be initial condition agnostic.
 
 #### Recurrent
 
diff --git a/docs/MarkovProcess.md b/docs/MarkovProcess.md
@@ -2,6 +2,4 @@
 
 Prob L16
 
-
-
-**Definition:** Markov processes are multiple trials of [[MarkovChains.md]].
+**Definition:** Markov processes are multiple trials of [MarkovChains](MarkovChains.md)
diff --git a/docs/MatrixMultiplication.md b/docs/MatrixMultiplication.md
@@ -14,4 +14,4 @@ Note: To multiply two matricies the number of columns in the first matrix must b
 
 AB is not equal to BA (in pretty much all cases). Often this is not even defined.
 
-See [[VectorMatrixMultipication.md]] for information about vector and matrix products.
+See [VectorMatrixMultipication](VectorMatrixMultipication.md) for information about vector and matrix products.
diff --git a/docs/Memory.md b/docs/Memory.md
@@ -8,10 +8,10 @@ Memory performance can affect compute speed of multiple applications running con
 
 ## Links
 
-[[DRAM.md]]
-[[DRAMChips.md]]
-[[DRAMCell.md]]
-[[RowBuffer.md]]
-[[DRAMBanks.md]]
-[[DRAMRefresh.md]]
-[[DisturbanceErrors.md]]
+- [DRAM](DRAM.md)
+- [DRAMChips](DRAMChips.md)
+- [DRAMCell](DRAMCell.md)
+- [RowBuffer](RowBuffer.md)
+- [DRAMBanks](DRAMBanks.md)
+- [DRAMRefresh](DRAMRefresh.md)
+- [DisturbanceErrors](DisturbanceErrors.md)
diff --git a/docs/MemoryManagement.md b/docs/MemoryManagement.md
@@ -41,4 +41,4 @@ Memory management in C++ is done using a few keywords shown below
 	cout << x;
 ```
 
-A few cases of memory management in action are [[SinglyLinkedList.md]] and [[DoublyLinkedList.md]] which both require memory management to ensure nodes in the heap are not lost after removing a node from the list. 
+A few cases of memory management in action are [SinglyLinkedList](SinglyLinkedList.md) and [[DoublyLinkedList.md]] which both require memory management to ensure nodes in the heap are not lost after removing a node from the list. 
diff --git a/docs/MergeSort.md b/docs/MergeSort.md
@@ -4,7 +4,7 @@ CLRS 2.3
 
 
 
-**Definition:** Merge sort is an algoritmh that uses [[DivideAndConquer.md]] to sort a list in log linear (n log(n)) time.
+**Definition:** Merge sort is an algoritmh that uses [DivideAndConquer](DivideAndConquer.md) to sort a list in log linear (n log(n)) time.
 
 Sample Implementation:
 
diff --git a/docs/Mesh.md b/docs/Mesh.md
@@ -4,11 +4,11 @@ CS 331 W11 L2
 
 
 
-**Definition:** A mesh is a representational grid of an object's surface used in [[SurfaceRepresentation.md]]
+**Definition:** A mesh is a representational grid of an object's surface used in [SurfaceRepresentation](SurfaceRepresentation.md)
 
 Think of a fishing net. We have straight lines that subdivide the point by calculating regular intervals and exact points at those intervals. This gives the illusion of continuous surfaces, but is actually a discrete set of points. 
 
-See [[Triangulation.md]] for implementation details.
+See [Triangulation](Triangulation.md) for implementation details.
 
 ### IN BLENDER
 
diff --git a/docs/MicroArchitecture.md b/docs/MicroArchitecture.md
@@ -8,8 +8,8 @@ Computer Architecture L2
 
 There are many micro architecture implementations of each ISA, but very few different ISAs because changes to ISAs breaks compatibility. 
 
-This is anything in hardware not exposed to software. This includes speculative execution (preloading data), [[SuperScalar.md]], and [[OutOfOrderExecution.md]].
+This is anything in hardware not exposed to software. This includes speculative execution (preloading data), [SuperScalar](SuperScalar.md), and [[OutOfOrderExecution.md]].
 
-Most of the time [[Cache.md]] is not exposed to the programmer, but sometimes these things are. 
+Most of the time [Cache](Cache.md) is not exposed to the programmer, but sometimes these things are. 
 
-Microarchitecture can also set core frequency, but this is sometimes in the [[ISA.md]] which means it is not always. 
+Microarchitecture can also set core frequency, but this is sometimes in the [ISA](ISA.md) which means it is not always. 
diff --git a/docs/MinMaxScaling.md b/docs/MinMaxScaling.md
@@ -26,4 +26,4 @@ for i in df:
 df.describe()
 ```
 
-See [[FeatureScaling.md]] for more.
+See [FeatureScaling](FeatureScaling.md) for more.
diff --git a/docs/MixedRandomVariable.md b/docs/MixedRandomVariable.md
@@ -4,8 +4,8 @@ Prob L8
 
 
 
-**Definition:** A mixed random variable is a [[RandomVariables.md]] comprised of some continuous and discrete randomness. 
+**Definition:** A mixed random variable is a [RandomVariables](RandomVariables.md) comprised of some continuous and discrete randomness. 
 
 An example is a random variable where there is a 1/2 chance of flipping a coin (discrete) to get 1 dollar and a 1/2 chance of getting a random number of dollars between 0 and 1 (continuous). This is a tree where the first split is between coin flip and random value then there is another layer where you flip the coin or get the random amount of money.
 
-These types of variables can often be combined into a [[CumulativeDensityFunction.md]] to show the probability of getting a value or less than it. 
+These types of variables can often be combined into a [CumulativeDensityFunction](CumulativeDensityFunction.md) to show the probability of getting a value or less than it. 
diff --git a/docs/ModelBasedLearning.md b/docs/ModelBasedLearning.md
@@ -6,4 +6,4 @@ ML CH1
 
 **Definition:** Model based learning takes in inputs, does predictions, and gives an output. 
 
-This is different than [[InstanceBasedLearning.md]] because it tries to learn patterns instead of match them.
+This is different than [InstanceBasedLearning](InstanceBasedLearning.md) because it tries to learn patterns instead of match them.
diff --git a/docs/MonoBehaviour.md b/docs/MonoBehaviour.md
@@ -8,6 +8,6 @@ CS 331 W12 L3
 
 Each script contains code for a class that inherits from monobehaviour. 
 
-The update function is used to control the [[GameLoop.md]]
+The update function is used to control the [GameLoop](GameLoop.md)
 
 The start function is called a singular time when the object is instantiated. 
diff --git a/docs/Movement.md b/docs/Movement.md
@@ -26,8 +26,8 @@ if(rotateLeft){
 The issue with this is movement may not feel natural because there is no acceleration being applied to the object you are just moving it by a certain amount. In essence, you are assigning a velocity to the object for the frames where the "up" key is pressed.
 
 
-See [[Input.md]] for more information about the Input class. 
+See [Input](Input.md) for more information about the Input class. 
 
-See [[Vector3.md]] for more information about positions.
+See [Vector3](Vector3.md) for more information about positions.
 
-See [[Quaternions.md]] for more about rotation/angles
+See [Quaternions](Quaternions.md) for more about rotation/angles
diff --git a/docs/MultilabelClassification.md b/docs/MultilabelClassification.md
@@ -2,8 +2,6 @@
 
 ML D2
 
-
-
 **Definition:** Multilabel classification is classification where there may be multiple binary outputs that are true.
 
 An example of this would be an human recognition model. Let's say we want to know if bob, jim, or mary are in an image. If bob and jim are in the image the model should then return [true, true, false] or some sort of understandable output to denote such information.
diff --git a/docs/NaryOperations.md b/docs/NaryOperations.md
@@ -2,6 +2,4 @@
 
 SS
 
-
-
 **Definition:** N-ary operations is a general term for operations that take a finite and specific number of inputs, but don't fall into the category of unary, binary, ternary, or in some cases quaternary.
diff --git a/docs/NoveltyDetection.md b/docs/NoveltyDetection.md
@@ -6,4 +6,4 @@ ML CH1
 
 **Definition:** Novelty detection is used to detect new samples that appear different from other instances in the training set.
 
-This is similar to [[AnomalyDetection.md]].
+This is similar to [AnomalyDetection](AnomalyDetection.md)
diff --git a/docs/Nullity.md b/docs/Nullity.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
-**Definition:** The nullity of a matrix is the dimensionallity of its [[NullSpace.md]].
+**Definition:** The nullity of a matrix is the dimensionallity of its [NullSpace](NullSpace.md).
 
 The nullity of a matrix is equal to the number of non-pivot (free) variable columns.
diff --git a/docs/OfflineLearning.md b/docs/OfflineLearning.md
@@ -7,4 +7,4 @@ ML CH1
 
 Think of alphago. It was trained to play go, then the agent was sent out to enact the policy not to learn more when playing real people.
 
-When models become obselete due to inability to learn new information this is called model rot or data drift. This happens because data always changes but the model can't. The solution to this is either using an [[OnlineLearning.md]] model or retraining.
+When models become obselete due to inability to learn new information this is called model rot or data drift. This happens because data always changes but the model can't. The solution to this is either using an [OnlineLearning](OnlineLearning.md) model or retraining.
diff --git a/docs/OneHotEncoding.md b/docs/OneHotEncoding.md
@@ -8,4 +8,4 @@ ML CH2
 
 An example of this is if you have a column that states the distance from the ocean. The options are island, 1 hour, and near ocean. These could be encoded as integers, but the issue is that these value are not representative of what the values mean thus mapping this to a linear regression would cause issues because higher or lower does not necessarily mean better. As such, you would then add 1 hour, near ocean, and island as columns and then set booleans as true or false based on the distance string. 
 
-See [[LabelEncoding.md]] for a simple way of encoding strings as numbers. This is useful when there are lots of options and the model knows the data is arbitrarily numbered.
+See [LabelEncoding](LabelEncoding.md) for a simple way of encoding strings as numbers. This is useful when there are lots of options and the model knows the data is arbitrarily numbered.
diff --git a/docs/OneVersusAll.md b/docs/OneVersusAll.md
@@ -8,4 +8,4 @@ ML D2
 
 Think of this as a series of SVC or SGD classifiers that output some likelihood that the current input is part of a particular class. You then send the input into each model and whichever one outputs the highest probability is the class that the input belongs to. 
 
-See also [[OneVersusOne.md]] for another strategy to put together models to do classification.
+See also [OneVersusOne](OneVersusOne.md) for another strategy to put together models to do classification.
diff --git a/docs/OneVersusOne.md b/docs/OneVersusOne.md
@@ -10,4 +10,4 @@ Basically, you train a model to compare between one set and another. It outputs 
 
 As such, one must train N * (N-1)/2 classifiers which can be a lot depending on how many classes there are. In the case of 0-9 (mnist) this comes out to 45 models. On the flip side, given how the model works, each model does not need to be trained on the entire set only the subset containing the classes being compared. 
 
-See also [[OneVersusAll.md]] for another strategy regarding classification based on binary classifier chaining. The main reason OvO can be better than OvA is because some models are slow to train on larger datasets thus only training models on a subset, albeit training more models, can be faster. This is especially true for support vector machine classification models. In most cases however OvA is preferred.
+See also [OneVersusAll](OneVersusAll.md) for another strategy regarding classification based on binary classifier chaining. The main reason OvO can be better than OvA is because some models are slow to train on larger datasets thus only training models on a subset, albeit training more models, can be faster. This is especially true for support vector machine classification models. In most cases however OvA is preferred.
diff --git a/docs/OnlineLearning.md b/docs/OnlineLearning.md
@@ -5,10 +5,10 @@ ML CH1
 
 **Definition:** Online learning is the process of learning as a model is fed new data.
 
-This paradigm is in contrast with [[OfflineLearning.md]] also known as batch learning where all data is trained on at the start and then the learned behavior is acted upon in a static way in perpetuity. 
+This paradigm is in contrast with [OfflineLearning](OfflineLearning.md) also known as batch learning where all data is trained on at the start and then the learned behavior is acted upon in a static way in perpetuity. 
 
 When using online learning, you can use either individual samples to train on or mini-batches which are groupings of samples.
 
 This method can be used to train models on data where not all of the training data can fit in the machine's memory. This is referred to as out-of-core learning. When doing this, the algorithm loads in a mini-batch, runs a training step, and then repeats for all of the dataset. This may be confusing as the learning is done offline, but considering online learning more as incremental learning can help resolve this thought issue.
 
-The rate at which these models adapt to new information is called the [[LearningRate.md]]. When this is high they respond quickly to new data at the cost of losing old data faster. It is a balancing game. Counter to this, with a low learning rate we have more 'inertia' from old data in the set.
+The rate at which these models adapt to new information is called the [LearningRate](LearningRate.md) When this is high they respond quickly to new data at the cost of losing old data faster. It is a balancing game. Counter to this, with a low learning rate we have more 'inertia' from old data in the set.
diff --git a/docs/Opcode.md b/docs/Opcode.md
@@ -4,6 +4,6 @@ CA L3
 
 
 
-**Definition:** An opcode is the first part of an [[Instruction.md]] which describes what the instruction does. 
+**Definition:** An opcode is the first part of an [Instruction](Instruction.md) which describes what the instruction does. 
 
-This is a form of [[BitSteering.md]]
+This is a form of [BitSteering](BitSteering.md)
diff --git a/docs/Operands.md b/docs/Operands.md
@@ -4,6 +4,6 @@ CA L3
 
 
 
-**Definition:** Operands describe who an [[Instruction.md]] should be done to. 
+**Definition:** Operands describe who an [Instruction](Instruction.md) should be done to. 
 
-See [[Opcode.md]] for the other part of an instruction. 
+See [Opcode](Opcode.md) for the other part of an instruction. 
diff --git a/docs/Optimizer.md b/docs/Optimizer.md
@@ -8,7 +8,7 @@ ML P580
 
 Here are a list of common optimizers:
 
-[[Momentum.md]] - Gradient is acceleration
-[[NAG.md]] - Calculates momentum slightly ahead of current position
-[[AdaGrad.md]] - Good for simple quadratic problems
-[[Adam.md]] - Generally the best
+[Momentum](Momentum.md) - Gradient is acceleration
+[NAG](NAG.md) - Calculates momentum slightly ahead of current position
+[AdaGrad](AdaGrad.md) - Good for simple quadratic problems
+[Adam](Adam.md) - Generally the best
diff --git a/docs/OrdinaryLeastSquares.md b/docs/OrdinaryLeastSquares.md
@@ -6,4 +6,4 @@ ML CH2
 
 **Definition:** Ordinary least squares is a formula used to find the statistical line of best fit for some dataset where we are trying to minimize the square error. 
 
-When doing [[LinearRegression.md]] there are two common methods to find the line. One is OLS and the other is [[GradientDescent.md]]. 
+When doing [LinearRegression](LinearRegression.md) there are two common methods to find the line. One is OLS and the other is [[GradientDescent.md]]. 
diff --git a/docs/OrthogonalComplement.md b/docs/OrthogonalComplement.md
@@ -16,6 +16,6 @@ Every element of the nullspace is in the orthogonal complement and vice versa th
 
 ## Dimensionality
 
-For the arbitrary subspace V, we know dim(V) = k. As such, we also know for O which is the orthogonal complement, that dim(O) = k - n where R^n is the [[AmbientSpace.md]].
+For the arbitrary subspace V, we know dim(V) = k. As such, we also know for O which is the orthogonal complement, that dim(O) = k - n where R^n is the [AmbientSpace](AmbientSpace.md)
 
 This is given because we also know that the [Nullity](Nullity.md) + [Rank](Rank.md) = dim([Ambient Space](AmbientSpace.md)).
diff --git a/docs/OutOfOrderExecution.md b/docs/OutOfOrderExecution.md
@@ -6,4 +6,4 @@ Computer Architecture L2
 
 **Definition:** An optimization strategy that executes commands out of order to reduce the amount of clocks/time taken to complete computations. This is complex as it can be hard to determine if a command relies upon another command that came in earlier.  
 
-See [[DataFlow.md]] for more information about out of order/non-Von Neumann computation.
+See [DataFlow](DataFlow.md) for more information about out of order/non-Von Neumann computation.
diff --git a/docs/Overfitting.md b/docs/Overfitting.md
@@ -10,6 +10,6 @@ Generally, this is caused by having a complex model with lots of features but no
 
 When reducing the risk of overfitting by simplifying a model we call this regularization. Doing this we can either remove features or limit the one or more degrees of freedom of the model. Let's assume we are doing linear regression, we can limit the m value (mx+b) to be within a certain range so while the model has two degrees of freedom still, it is simpler and thus, in some cases, more generalizable depending on the training samples and the inputs being inferenced upon. 
 
-Overfitting can be seen when you train on training data and find that the test set values have a high [[GeneralizationError.md]] meaning thatn the model is unable to generalize.
+Overfitting can be seen when you train on training data and find that the test set values have a high [GeneralizationError](GeneralizationError.md) meaning thatn the model is unable to generalize.
 
 Overfitting can be easily thought about as making your model too good at the training data which limits its ability to generalize.
diff --git a/docs/Oversmooothing.md b/docs/Oversmoothing.md
diff --git a/docs/PartialDerivative.md b/docs/PartialDerivative.md
@@ -6,4 +6,4 @@ ML D2
 
 **Definition:** The partial derivative is a derivative of a multivariate function with respect to a singular variable by considering the others as constants.
 
-Often this is used in [[GradientDescent.md]] to determine in what ways parameters need to change.
+Often this is used in [GradientDescent](GradientDescent.md) to determine in what ways parameters need to change.
diff --git a/docs/Partition.md b/docs/Partition.md
@@ -7,5 +7,5 @@ AM W14 Reading
 
 Basically, a partition is the subsets of a set where all subsets together make the original set and all subsets are unique in their elements where any intersection between them is the null set. Keep in mind the partition is the combination of all of them not simply a singular one of the subsets which is where this diverges from the computational term "partition".
 
-This relates to [[EquivalenceClass.md]] as certain partitions are equivalence classes when considering equivalence relation sets. 
+This relates to [EquivalenceClass](EquivalenceClass.md) as certain partitions are equivalence classes when considering equivalence relation sets. 
 
diff --git a/docs/Pipelining.md b/docs/Pipelining.md
@@ -6,4 +6,4 @@ CA L3
 
 **Definition:** Pipelining is the use of CPU hardware such that simultaneous execution of more than one instruction occurs at the same time. 
 
-See [[OutOfOrderExecution.md]]
+See [OutOfOrderExecution](OutOfOrderExecution.md).
diff --git a/docs/PlaneToPlaneDistance.md b/docs/PlaneToPlaneDistance.md
@@ -4,7 +4,7 @@ Khan
 
 
 
-See [[DistanceToPlane.md]] for distance from plane to point. 
+See [DistanceToPlane](DistanceToPlane.md) for distance from plane to point. 
 
 This only is useful for planes that are paralell otherwise they will intersect. 
 
@@ -12,6 +12,6 @@ Steps:
 
 1. Find equation of both planes 
 2. Find representative point
-3. Take [[DistanceToPlane.md]] from the rep point to the other plane.
+3. Take [DistanceToPlane](DistanceToPlane.md) from the rep point to the other plane.
 
 This is true because all points will be the same distance from the other plane given that they are paralell. Otherwise, it would be imperative that they intercect and thus have a min distance of 0.
diff --git a/docs/PoissonProcess.md b/docs/PoissonProcess.md
@@ -4,7 +4,7 @@ Prob L14
 
 
 
-**Definition:** A poisson process is a continous time version of the [[BernoulliProcess.md]].
+**Definition:** A poisson process is a continous time version of the [BernoulliProcess](BernoulliProcess.md).
 
 A poisson process models continuous time with binary outcomes. Generally, we simply track when the true case occurs.
 
@@ -12,4 +12,4 @@ Poisson processes presuppose independence and homogenity of probability over tim
 
 ## See Also
 
-[[BernoulliProcess.md]] - Memoryless, discrete time, process of binary outcomes
+[BernoulliProcess](BernoulliProcess.md) - Memoryless, discrete time, process of binary outcomes
diff --git a/docs/Pole.md b/docs/Pole.md
@@ -4,7 +4,7 @@ CG W13 L2
 
 # Notes
 
-**Definition:** A [[Vertex.md]] of [[Degree.md]] 3,5,6,7,8,... This means no isolated verticies are allowed. 
+**Definition:** A [Vertex](Vertex.md) of [[Degree.md]] 3,5,6,7,8,... This means no isolated verticies are allowed. 
 
 Yes, that is a crap definition. Here is the real one not being taught by a moron:
 
diff --git a/docs/Postcondition.md b/docs/Postcondition.md
@@ -4,4 +4,4 @@ U 1.4.1
 
 
 
-**Definition:** Postconditions are the expected outputs of a function or program which are predicated upon the specified [[Preconditions.md]].
+**Definition:** Postconditions are the expected outputs of a function or program which are predicated upon the specified [Preconditions](Preconditions.md).
diff --git a/docs/Prediction.md b/docs/Prediction.md
@@ -6,4 +6,4 @@ Ch2
 
 **Definition:** Prediction is the process of predicting an output given a sample.
 
-This is different than [[Inference.md]] which is focused on understanding relationshipts between variables.
+This is different than [Inference](Inference.md) which is focused on understanding relationshipts between variables.
diff --git a/docs/Preimage.md b/docs/Preimage.md
@@ -16,4 +16,4 @@ To find the preimage of some image under T we need to find all input vectors a s
 
 If we specify the image as <1,2> and <0,0> then we need to find all <x_1,x_2> such that <x_1, x_2> x L.T. Matrix = <1,2> or <0,0>. 
 
-This final result can be found using [[ReducedRowEchelonForm.md]] of both augmented matricies created using the above information where the result is the computed values of all pivot variables put into matricies.
+This final result can be found using [ReducedRowEchelonForm](ReducedRowEchelonForm.md) of both augmented matricies created using the above information where the result is the computed values of all pivot variables put into matricies.
diff --git a/docs/PretrainedModels.md b/docs/PretrainedModels.md
@@ -6,7 +6,7 @@ ML P570
 
 **Definition:** Pretrained models are ML models that have been trained in the past and can be used for doing other things.
 
-Pretrained models often use [[TransferLearning.md]] because the goal with pretrained models is to use the existing model that has already been trained to work well with a new set of data. This often involves changing the model's top layers (training new ones for the specific task) while keeping the lower layers in tact as they often do simple tasks like edge detection which are reusable.
+Pretrained models often use [TransferLearning](TransferLearning.md) because the goal with pretrained models is to use the existing model that has already been trained to work well with a new set of data. This often involves changing the model's top layers (training new ones for the specific task) while keeping the lower layers in tact as they often do simple tasks like edge detection which are reusable.
 
 When doing this the layers that don't change are called the fixed weights while the ones that are changed are called the trainable weights.
 
diff --git a/docs/Probability.md b/docs/Probability.md
@@ -6,11 +6,11 @@ Stats CH1
 
 **Definition:** The probability is the likelihood of something happening as a percentage between 0 and 1 or 0% and 100%. 
 
-Let X be a set and F a set of subsets of X. A probability on (X,F) is a function u : F -> [0,1]. This means for each set in F we have a probability between 0 and 1 for each set. See [[SetFunction.md]] for more about the u (mu greek character) function.
+Let X be a set and F a set of subsets of X. A probability on (X,F) is a function u : F -> [0,1]. This means for each set in F we have a probability between 0 and 1 for each set. See [SetFunction](SetFunction.md) for more about the u (mu greek character) function.
 
 The probability function must be a set function, but that is not sufficient. We also need for u(0) where 0 is the empty set to be equal to 0. We also need u(X) = 1 (totaling 100%), and if A and B are disjoint sets then u(A union B) = u(A) + u(B). This final part means the probability of the union of two different sets is equal to the sum of the probabilities of both sets individually. 
 
-When we have a domain that is finite we then state we have a [[DiscreteProbability.md]] whereas when we have an interval then the function is said to be a [[ContinuousProbability.md]].
+When we have a domain that is finite we then state we have a [DiscreteProbability](DiscreteProbability.md) whereas when we have an interval then the function is said to be a [[ContinuousProbability.md]].
 
 In practical terms, for u(X) X is the set off outcomes that are possible and the function returns the probability of said outcome. 
 
diff --git a/docs/ProbabilityDensityFunctions.md b/docs/ProbabilityDensityFunctions.md
@@ -4,7 +4,7 @@ Stats ch1
 
 
 
-**Definition:** A probability density function shows the probability of outcomes for [[ContinuousProbability.md]] problems.
+**Definition:** A probability density function shows the probability of outcomes for [ContinuousProbability](ContinuousProbability.md) problems.
 
 **Important:** PDFs are for continuous random variables whereas PMFs are for discrete.
 
diff --git a/docs/ProbabilityLaw.md b/docs/ProbabilityLaw.md
@@ -2,12 +2,10 @@
 
 L1
 
-
-
 **Definition:** The probability law assigns some set A (event) a nonnegative P(A) that describes the likelihood fo the elements of A.  
 
 The probability law specifies the likelihood of the input given the sample space. The rules for it are as follows:
 
 1. P(A) >= 0
-2. P(A union B) = P(A) + P(B) if A and B are [[DisjointSet.md]].
+2. P(A union B) = P(A) + P(B) if A and B are [Disjoint Set](DisjointSet.md)
 3. P(Omega) = 1
diff --git a/docs/ProbabilityMassFunction.md b/docs/ProbabilityMassFunction.md
@@ -4,7 +4,7 @@ L4
 
 
 
-**Definition:** A PMF describes the probability of some mapping of a [[RandomVariable.md]] from inputs to a specific output. 
+**Definition:** A PMF describes the probability of some mapping of a [RandomVariables](RandomVariables.md) from inputs to a specific output. 
 
 **Important:** PMFs are for discrete random variables whereas PDFs are for continuous.
 
@@ -45,7 +45,7 @@ Conditional PMFs are just PMFs but they have a specified even that occurred. In 
 
 ## Joint (L6)
 
-See [[JointProbability.md]] for joint PMF information.
+See [JointProbability](JointProbability.md) for joint PMF information.
 
 ## Marginal (L6)
 
diff --git a/docs/ProgrammerVisibleState.md b/docs/ProgrammerVisibleState.md
@@ -10,6 +10,6 @@ This includes the program counter, registers, and memory.
 
 This is all information visible to the programmer.
 
-See [[ISA.md]] for more related content.
+See [ISA](ISA.md) for more related content.
 
-There is also programmer invisible state which includes cache and pipline registers are example of state in the [[MicroArchitecture.md]]
+There is also programmer invisible state which includes cache and pipline registers are example of state in the [MicroArchitecture](MicroArchitecture.md).
diff --git a/docs/Quaternions.md b/docs/Quaternions.md
@@ -8,4 +8,4 @@ CS 331 W11 L2
 
 There are names for the rotations with regard to the local coordinate system. The lean forward and backward is the pitch (rotation about x axis), the rotation around their center is the yaw (rotation about y axis think spinning in circles), and the rotation about the z axis is called the roll (think barrel rolls).  
 
-See [[Transform.md]] for more information about coordinate systems and such.
+See [Transform](Transform.md) for more information about coordinate systems and such.
diff --git a/docs/Queue.md b/docs/Queue.md
@@ -4,7 +4,7 @@ CS202 L14 / CS303 Ch 1
 
 
 
-**Definition:** This is a datatype that works on a first in first out basis. This is often implemented using a [[SinglyLinkedList.md]] with a link to the tail (where more nodes would be added). This is also often implemented such that you add to the end and remove from the start. 
+**Definition:** This is a datatype that works on a first in first out basis. This is often implemented using a [SinglyLinkedList](SinglyLinkedList.md) with a link to the tail (where more nodes would be added). This is also often implemented such that you add to the end and remove from the start. 
 
 enqueue: add to queue
 
@@ -12,4 +12,4 @@ dequeue: remove from queue
 
 peek: view the front element
 
-It is important to note, generally, people implement these to add to the back and remove from the start although either direction is functionally equivalent. See [[Stack.md]] for information about a lifo approach.
+It is important to note, generally, people implement these to add to the back and remove from the start although either direction is functionally equivalent. See [Stack](Stack.md) for information about a lifo approach.
diff --git a/docs/RMSE.md b/docs/RMSE.md
@@ -30,4 +30,4 @@ total = math.sqrt(total)
 print(total)
 ```
 
-Another metric for errors is [[MAE.md]].
+Another metric for errors is [MAE](MAE.md)
diff --git a/docs/RandomExperiment.md b/docs/RandomExperiment.md
@@ -6,4 +6,4 @@ Ch 1.1
 
 **Definition:** A random experiment is a specified set of procedures that result in a truly random outcome (not necessarily uniformly) in the sample space.
 
-This is different than a [[RandomVariables.md]] in the sense that a random variable maps the outcomes of a given experiment to another value whereas this outputs the outcome.
+This is different than a [RandomVariables](RandomVariables.md) in the sense that a random variable maps the outcomes of a given experiment to another value whereas this outputs the outcome.
diff --git a/docs/RandomForest.md b/docs/RandomForest.md
@@ -4,8 +4,8 @@ ML D4
 
 
 
-**Definition:** A random forest is an [[Ensembles.md]] of [[DecisionTrees.md]] used to make predictions based on majority voting or some other cost function.
+**Definition:** A random forest is an [Ensembles](Ensembles.md) of [[DecisionTrees.md]] used to make predictions based on majority voting or some other cost function.
 
 This uses a wisdom of the crowd philosophy where most likely the aggregated sum of many answers is better than one expert answer.
 
-Random forests are normally trained with [[Bagging.md]] and sometimes with [[Pasting.md]]. 
+Random forests are normally trained with [Bagging](Bagging.md) and sometimes with [[Pasting.md]]. 
diff --git a/docs/RandomSubspaces.md b/docs/RandomSubspaces.md
@@ -4,4 +4,4 @@ ML D5
 
 
 
-**Definition:** The random subspaces method is similar to [[RandomPatches.md]] except it keeps all training instances and only samples features.
+**Definition:** The random subspaces method is similar to [RandomPatches](RandomPatches.md) except it keeps all training instances and only samples features.
diff --git a/docs/RandomVariables.md b/docs/RandomVariables.md
@@ -21,4 +21,4 @@ Example:
 X = {1 if heads}
 	{0 if tails}
 
-Geometric random variables are random variables that result in [[ProbabilityMassFunction.md]] with a geometric shape (see PMF for more).
+Geometric random variables are random variables that result in [ProbabilityMassFunction](ProbabilityMassFunction.md) with a geometric shape (see PMF for more).
diff --git a/docs/RegressionProblem.md b/docs/RegressionProblem.md
@@ -6,13 +6,13 @@ ML L1
 
 **Definition:** A regression problem is a problem where the value trying to be predicted is continuous (think graphing not yes/no).
 
-Yes/no problem is a [[ClassificationProblem.md]].
+Yes/no problem is a [ClassificationProblem](ClassificationProblem.md)
 
-Also see for a more specific example [[LinearRegression.md]]. There are other types of regression as well such as polynomial regression (no note at this time).
+Also see for a more specific example [LinearRegression](LinearRegression.md) There are other types of regression as well such as polynomial regression (no note at this time).
 
 When discussing regression, we often use the term "target" instead of "label" to describe the desired output. This contrasts with classification problems where we use the term label.
 
-See also [[LogisticRegression.md]] where we assign a probability of group membership.
+See also [LogisticRegression](LogisticRegression.md) where we assign a probability of group membership.
 
 With regression, we describe the performance measure as the utility function or fitness function. This measures how good the model is. The inverse of this is the cost function which measures how bad it is.
 
diff --git a/docs/RelativeFrequency.md b/docs/RelativeFrequency.md
@@ -4,6 +4,6 @@ Ch 1.1
 
 
 
-**Definition:** Relative frequency is the value f/n where f is the [[Frequency.md]] of an event under a [[RandomExperiment.md]].
+**Definition:** Relative frequency is the value f/n where f is the [Frequency](Frequency.md) of an event under a [[RandomExperiment.md]].
 
-Note this is not the same as [[Probability.md]] because probability is the true likelihood whereas relative frequency has been the historical observed likelihood based on the experiment. This value does however tend towards the probability. See [[LawOfLargeNumbers.md]].
+Note this is not the same as [Probability](Probability.md) because probability is the true likelihood whereas relative frequency has been the historical observed likelihood based on the experiment. This value does however tend towards the probability. See [[LawOfLargeNumbers.md]].
diff --git a/docs/Rotate.md b/docs/Rotate.md
@@ -6,6 +6,6 @@ CS331 W12 L2
 
 Rotate is a function of the Transform class that allows rotation relative to the local rotation.
 
-See [[Translate.md]] for a similar function but for position. 
+See [Translate](Translate.md) for a similar function but for position. 
 
-Also see [[LocalScale.md]] for changing the local scale.
+Also see [LocalScale](LocalScale.md) for changing the local scale.
diff --git a/docs/Rotation.md b/docs/Rotation.md
@@ -14,4 +14,4 @@ To create a matrix to represent a rotation do the following:
 2. Calculate each individual basis vector under the rotation we want (use trig)
 3. Aggregate the results into a final matrix where each column is the result of the basis vector transformation 
 
-This is the same way we normally create the [[StandardMatrix.md]] of a L.T. for other transformations.
+This is the same way we normally create the [StandardMatrix](StandardMatrix.md) of a L.T. for other transformations.
diff --git a/docs/RowBuffer.md b/docs/RowBuffer.md
@@ -2,7 +2,7 @@
 
 
 
-**Definition:** The row buffer is the buffer used to cache a row that is from [[DRAM.md]]. This is used because it is 2-3 times more efficient to query a buffered memory address than it is to query for a new row in memory. This is handled by the DRAM memory controller. 
+**Definition:** The row buffer is the buffer used to cache a row that is from [DRAM](DRAM.md) This is used because it is 2-3 times more efficient to query a buffered memory address than it is to query for a new row in memory. This is handled by the DRAM memory controller. 
 
 Precharging is where the memory controller replaces the current buffered row with a new one that was requested this is done by sending highvoltage to the new and low voltage to the old. When these conflicts occur, this is 2-3 times slower than if the row was already cached.
 
diff --git a/docs/RuleLearning.md b/docs/RuleLearning.md
@@ -7,4 +7,4 @@ ML CH1
 
 **Definition:** Rule learning is the process of taking in lots of data and finding associations between data. 
 
-This information can be useful when trying to implement [[DimensionalityReduction.md]].
+This information can be useful when trying to implement [DimensionalityReduction](DimensionalityReduction.md)
diff --git a/docs/RuleOfSarrus.md b/docs/RuleOfSarrus.md
@@ -14,7 +14,7 @@ Det ([d e f]) = aei + bfg + cdh - afh - bdi - ceg
 
 When looking at the matrix we add the multiplied diagonals (starting from top row 3 values) to the right and subtract the diagonals to the left (multiplying each value in the diagonal).
 
-See [[Determinant.md]] for calculating determinants, what they represent, and how to find 2x2 with a formula.
+See [Determinant](Determinant.md) for calculating determinants, what they represent, and how to find 2x2 with a formula.
 
 Ex:
 
diff --git a/docs/Scheduling.md b/docs/Scheduling.md
@@ -2,4 +2,4 @@
 
 
 
-CPU Scheduling is done on the OS level and is generally simply about the clocks given. This can cause issues with [[DRAM.md]] because the DRAM controller prioritizes requests associated with buffered rows of memory meaning that even if two processes have the same priority they will not necessarily get the same access to memory because of optimizations done in the DRAM controller. 
+CPU Scheduling is done on the OS level and is generally simply about the clocks given. This can cause issues with [DRAM](DRAM.md) because the DRAM controller prioritizes requests associated with buffered rows of memory meaning that even if two processes have the same priority they will not necessarily get the same access to memory because of optimizations done in the DRAM controller. 
diff --git a/docs/SelfSupervisedLearning.md b/docs/SelfSupervisedLearning.md
@@ -6,6 +6,6 @@ ML CH1
 
 **Definition:** Self-supervised learning is the process of chaning input data and the model predicting the output where the output is known to it. 
 
-This is similar to [[SemiSupervisedLearning.md]] where models are trained to detect certain information (clustering) without knowing what the information means.  
+This is similar to [SemiSupervisedLearning](SemiSupervisedLearning.md) where models are trained to detect certain information (clustering) without knowing what the information means.  
 
 Basically, the model learns to train itself. By messing with inputs to get expected outputs.
diff --git a/docs/SinglyLinkedList.md b/docs/SinglyLinkedList.md
@@ -4,11 +4,11 @@ CS 221 W11 Lecture 13.
 
 
 
-**Definition:** Singly linked lists are lists that only contain pointers to the next item in the list. This is in contrast with [[DoublyLinkedList.md]] which have a pointer forward and backward.
+**Definition:** Singly linked lists are lists that only contain pointers to the next item in the list. This is in contrast with [DoublyLinkedList](DoublyLinkedList.md) which have a pointer forward and backward.
 
 There is a pointer that needs to point to the head and then finding every subsequent element is as simple as iterating through the list. The final item in the list contains a null pointer. 
 
-Additionally, there are no cycles ([[Graphs.md]]) in the list hence that makes them a "tree".
+Additionally, there are no cycles ([Graphs](Graphs.md)) in the list hence that makes them a "tree".
 
 Inserting at the start is done as follows:
 
diff --git a/docs/SkeletalAnimation.md b/docs/SkeletalAnimation.md
@@ -12,16 +12,16 @@ Additionally, bones are rigid, but they can rotate about their local y-axis, but
 
 A joint is another way to refer to a root or tail as that is where bones are joined together (join -> join+t)
 
-Sometimes, an [[Armature.md]] will be disjoint, but we should generally try not to do this.
+Sometimes, an [Armature](Armature.md) will be disjoint, but we should generally try not to do this.
 
 ## Steps to Implement
 
-1. Create [[Armature.md]] 
+1. Create [Armature](Armature.md) 
 2. Add bones (root bottom, tip top, body middle, and connected by joints)
 	- Start with one bone then extrude the rest of them
 	- When extruding, the root of the new bone will create a joint with the tip of the prior one
 		- When extruding a bone from the tip of another there is a parent child relationship. As such, there can be multiple children for a single parent.
 	- Name all bones in an apt way (ehh, I guess there is some value to this). 
-3. Create [[Mesh.md]] 
+3. Create [Mesh](Mesh.md) 
 4. Embed Armature into mesh
 
diff --git a/docs/SmallestCounterExample.md b/docs/SmallestCounterExample.md
@@ -1,6 +1,6 @@
 # Smallest Counterexample
 
-Abstract Math 10.3. This is similar to [[Induction.md]] and [[StrongInduction.md]]
+Abstract Math 10.3. This is similar to [Induction](Induction.md) and [[StrongInduction.md]]
 
 
 
diff --git a/docs/Stack.md b/docs/Stack.md
@@ -12,6 +12,6 @@ peek: get top element. This can also be implemented by doing pop then pushing th
 
 pop: remove from top
 
-This can be implemented as a [[SinglyLinkedList.md]]
+This can be implemented as a [SinglyLinkedList](SinglyLinkedList.md)
 
-See [[Queue.md]] for information about the fifo implementation.
+See [Queue](Queue.md) for information about the fifo implementation.
diff --git a/docs/StandardDeviation.md b/docs/StandardDeviation.md
@@ -6,7 +6,7 @@ Stats D2
 
 **Definition:** This is the average difference between each value in a dataset and the mean of the dataset. 
 
-See also [[Variance.md]] which is the squared value. As such, to find the standard deviation of some random variable X we can do the following:
+See also [Variance](Variance.md) which is the squared value. As such, to find the standard deviation of some random variable X we can do the following:
 
 std.dev = sqrt(var(X))
 
diff --git a/docs/Standardization.md b/docs/Standardization.md
@@ -6,9 +6,9 @@ ML CH2
 
 **Definition:** Standardization is the process of scaling values such that the value is equivalent to itself subtracing the mean and dividing by the standard deviation. 
 
-This is optimal in some cases as [[MinMaxScaling.md]] has issues with outliers. If there is one outlier that is much bigger than all other values the max will be very large thus squishing the range of most values to be low numbers which can effect the accuracy of models.
+This is optimal in some cases as [MinMaxScaling](MinMaxScaling.md) has issues with outliers. If there is one outlier that is much bigger than all other values the max will be very large thus squishing the range of most values to be low numbers which can effect the accuracy of models.
 
-See [[FeatureScaling.md]] for more.
+See [FeatureScaling](FeatureScaling.md) for more.
 
 Sample implementation:
 
@@ -28,6 +28,6 @@ print(df)
 
 ## Probabilistic Interpretation
 
-Standardization is the process of mapping some arbitrary [[NormalDistribution.md]] onto the normal distribution centered at 0 with a standard deviation of 1. This can be done simply by subtracting the mean of the normal distribution from each element and then dividing the subsequent values by the average standard deviation.
+Standardization is the process of mapping some arbitrary [NormalDistribution](NormalDistribution.md) onto the normal distribution centered at 0 with a standard deviation of 1. This can be done simply by subtracting the mean of the normal distribution from each element and then dividing the subsequent values by the average standard deviation.
 
 We do this because there is not a closed form solution to find the percentiles of a normal/gaussian distribution thus we use a lookup table which assumes the distribution is centered about 0 with a std. deviation of 1. This is all that is needed to fully describe a gaussian distribution. 
diff --git a/docs/StatisticsAndProbability.md b/docs/StatisticsAndProbability.md
@@ -11,18 +11,18 @@ Links to Stats Notes
 Probability and Statistical Inference Hogg, Tanis:
 
 Chapter 1.1: 
-	- [[SampleSpace.md]]
-	- [[StatisticalInference.md]]
-	- [[Frequency.md]]
-	- [[RelativeFrequency.md]]
-	- [[ProbabilityMassFunction.md]]
-	- [[SimpsonsParadox.md]]
-	- [[RandomExperiment.md]]
-	- [[RandomVariables.md]]
+	- [SampleSpace](SampleSpace.md)
+	- [StatisticalInference](StatisticalInference.md)
+	- [Frequency](Frequency.md)
+	- [RelativeFrequency](RelativeFrequency.md)
+	- [ProbabilityMassFunction](ProbabilityMassFunction.md)
+	- [SimpsonsParadox](SimpsonsParadox.md)
+	- [RandomExperiment](RandomExperiment.md)
+	- [RandomVariables](RandomVariables.md)
 
 Chapter 1.2:
-	- [[Event.md]] 
-	- [[SetFunction.md]]
+	- [Event](Event.md) 
+	- [SetFunction](SetFunction.md)
 
 Chapter 1.3:
 	- [Permutation](Permutation.md) 
@@ -41,7 +41,7 @@ Chapter 1.5:
 	- [Mutually Independent](MutuallyIndependent.md)
 
 Chapter 1.6:
-	- [Bayes Theroem](BayesTheroem.md)
+	- [Bayes Theroem](BayesTheorem.md)
 	- [Prior Probability](PriorProbability.md) 
 	- [Posterior Probability](PosteriorProbability.md)
 
@@ -55,106 +55,143 @@ Chapter 2.1:
 
 ---
 
-[[Probability.md]]
-[[SetFunction.md]]
-[[MonotonicFunction.md]]
-[[ProbabilityDensityFunctions.md]]
-[[BinomialDistribution.md]]
-[[PoissonDistribution.md]]
-[[ExponentialDistribution.md]]
-[[NormalDistribution.md]]
-[[Variance.md]]
-[[ConditionalProbabilities.md]]
-[[JointProbability.md]]
-[[MarginalProbabilities.md]]
-[[Covariance.md]]
-[[Correlation.md]]
-[[Quantile.md]]
-[[ExploratoryDataAnalysis.md]]
-[[DensityEstimation.md]]
-[[Bandwidth.md]] 
-[[Oversmooothing.md]] 
-[[Undersmoothing.md]] 
-[[Boxplots.md]]
-[[Crosstabulation.md]]
-[[MosaicPlot.md]]
-[[BayesianInference.md]]
-[[Individuals.md]]
-[[Variables.md]]
-[[Pictograph.md]]
-[[StemAndLeafPlot.md]]
-[[Percentile.md]]
-[[CumulativeRelativeFrequency.md]]
-[[IQR.md]]
+- [Probability](Probability.md)  
+- [Set Function](SetFunction.md)  
+- [Monotonic Function](MonotonicFunction.md)  
+- [Probability Density Functions](ProbabilityDensityFunctions.md)  
+- [Binomial Distribution](BinomialDistribution.md)  
+- [Poisson Distribution](PoissonDistribution.md)  
+- [Exponential Distribution](ExponentialDistribution.md)  
+- [Normal Distribution](NormalDistribution.md)  
+- [Variance](Variance.md)  
+- [Conditional Probabilities](ConditionalProbabilities.md)  
+- [Joint Probability](JointProbability.md)  
+- [Marginal Probabilities](MarginalProbabilities.md)  
+- [Covariance](Covariance.md)  
+- [Correlation](Correlation.md)  
+- [Quantile](Quantile.md)  
+- [Exploratory Data Analysis](ExploratoryDataAnalysis.md)  
+- [Density Estimation](DensityEstimation.md)  
+- [Bandwidth](Bandwidth.md)  
+- [Oversmoothing](Oversmoothing.md)  
+- [Undersmoothing](Undersmoothing.md)  
+- [Box Plots](Boxplots.md)  
+- [Crosstabulation](Crosstabulation.md)  
+- [Mosaic Plot](MosaicPlot.md)  
+- [Bayesian Inference](BayesianInference.md)  
+- [Individuals](Individuals.md)  
+- [Variables](Variables.md)  
+- [Pictograph](Pictograph.md)  
+- [Stem And Leaf Plot](StemAndLeafPlot.md)  
+- [Percentile](Percentile.md)  
+- [Cumulative Relative Frequency](CumulativeRelativeFrequency.md)  
+- [IQR](IQR.md)  
 
 PSA&AP MIT:
 
 L1:
-	- [[SampleSpace.md]]
-	- [[Complement.md]]
-	- [[DiscreteUniformLaw.md]]
-	- [[UniversalSet.md]]
-	- [[DisjointSet.md]]
-	- [[ProbabilityLaw.md]]
+
+- [Sample Space](SampleSpace.md)
+- [Complement](Complement.md)
+- [Discrete Uniform Law](DiscreteUniformLaw.md)
+- [Universal Set](UniversalSet.md)
+- [Disjoint Set](DisjointSet.md)
+- [Probability Law](ProbabilityLaw.md)
+
 L2:
-	- [[ConditionalProbabilities.md]]
-	- [[BayesTheroem.md]]
-	- [[TotalProbabilityTheroem.md]]
-	- [[ConditionalProbabilityTheroem.md]]
+
+- [Conditional Probabilities](ConditionalProbabilities.md)
+- [Bayes Theorem](BayesTheorem.md)
+- [Total Probability Theorem](TotalProbabilityTheorem.md)
+- [Conditional Probability Theorem](ConditionalProbabilityTheorem.md)
+
 L3:
-	- [[Independence.md]]
+
+- [Independence](Independence.md)
+
 L4:
-	- [[BinomialCoefficient.md]]
+
+- [Binomial Coefficient](BinomialCoefficient.md)
+
 L5:
-	- [[RandomVariables.md]]
-	- [[ProbabilityMassFunction.md]]
+
+- [Random Variables](RandomVariables.md)
+- [Probability Mass Function](ProbabilityMassFunction.md)
+
 L6:
-	- [[ProbabilityMassFunction.md]]
-	- [[Expectation.md]]
-	- [[Variance.md]]
-	- [[StandardDeviation.md]]
-	- [[JointProbability.md]]
+
+- [Probability Mass Function](ProbabilityMassFunction.md)
+- [Expectation](Expectation.md)
+- [Variance](Variance.md)
+- [Standard Deviation](StandardDeviation.md)
+- [Joint Probability](JointProbability.md)
+
 L7:
-	- Review
+
+- Review
+
 L8:
-	- [[ProbabilityMassFunction.md]]
-	- [[ProbabilityDensityFunctions.md]]
-	- [[Standardization.md]]
-	- [[CumulativeDensityFunction.md]]
-	- [[MixedRandomVariable.md]]
-	- [[NormalDistribution.md]]
-	- [[BernoulliRandomVariable.md]]
+
+- [Probability Mass Function](ProbabilityMassFunction.md)
+- [Probability Density Functions](ProbabilityDensityFunctions.md)
+- [Standardization](Standardization.md)
+- [Cumulative Density Function](CumulativeDensityFunction.md)
+- [Mixed Random Variable](MixedRandomVariable.md)
+- [Normal Distribution](NormalDistribution.md)
+- [Bernoulli Random Variable](BernoulliRandomVariable.md)
+
 L9:
-	- [[JointDensityFunction.md]]
+
+- [Joint Density Function](JointDensityFunction.md)
+
 L10:
-	- [[DerivedDistribution.md]]
+
+- [Derived Distribution](DerivedDistribution.md)
+
 L11:	
-	- [[Covariance.md]]
-	- [[CorrelationCoefficient.md]]
+
+- [Covariance](Covariance.md)
+- [Correlation Coefficient](CorrelationCoefficient.md)
+
 L12:
-	- [[IteratedExpectations.md]]
+
+- [Iterated Expectations](IteratedExpectations.md)
+
 L13:
-	- [[BernoulliProcess.md]] - Discrete memoryless
-	- [[MarkovChains.md]] - Discrete remembers
-	- [[PoissonProcess.md]] - Continuous memoryless
+
+- [Bernoulli Process](BernoulliProcess.md) - Discrete memoryless
+- [Markov Chains](MarkovChains.md) - Discrete remembers
+- [Poisson Process](PoissonProcess.md) - Continuous memoryless
+
 L14:
-	- [[PoissonProcess.md]]
-	- [[BernoulliProcess.md]]
+
+- [Poisson Process](PoissonProcess.md)
+- [Bernoulli Process](BernoulliProcess.md)
+
 L15:
-	- Skipped this as it was poisson part 2
+
+- Skipped this as it was Poisson part 2
+
 L16:
-	- [[MarkovChains.md]]
-	- [[MarkovProcess.md]]
+
+- [Markov Chains](MarkovChains.md)
+- [Markov Process](MarkovProcess.md)
+
 L17:
-	- [[MarkovChains.md]]
-	- [[PeriodicChain.md]]
+
+- [Markov Chains](MarkovChains.md)
+- [Periodic Chain](PeriodicChain.md)
+
 L18:
-	- Skipped markov part 3
+
+- Skipped Markov part 3
+
 L19:
-	- [[LawOfLargeNumbers.md]]
-	- [[RegressionToTheMean.md]]
-	- [[MarkovInequality.md]]
+
+- [Law Of Large Numbers](LawOfLargeNumbers.md)
+- [Regression To The Mean](RegressionToTheMean.md)
+- [Markov Inequality](MarkovInequality.md)
+
 L20:
-	- [[CentralLimitTheroem.md]]
 
-The rest of the lectures discuss inference. I am not planning to read this as it will be covered in my Elements textbook.
+- [Central Limit Theorem](CentralLimitTheorem.md)
diff --git a/docs/StochasticAlgorithm.md b/docs/StochasticAlgorithm.md
@@ -6,4 +6,4 @@ ML CH2
 
 **Definition:** A stochastic algorithm is an optimization algorithm that uses randomness. 
 
-One example of this is [[KMeans.md]] which picks random cluster centroids.
+One example of this is [KMeans](KMeans.md) which picks random cluster centroids.
diff --git a/docs/StrongInduction.md b/docs/StrongInduction.md
@@ -1,8 +1,6 @@
 # Strong Induction
 
-Abstract Math 10.2. Weak induction is the normal form of induction discussed in [[Induction.md]] 
-
-## Notes 
+Abstract Math 10.2. Weak induction is the normal form of induction discussed in [Induction](Induction.md) 
 
 **Definition:** Strong induction is the process of proving one or more prior true statements implies a later one much like weak induction, but with strong induction we can prove in the form of $S_{k-5} \implies S_{k+1}$ so long as k-5 is in the domain and that every value between k-5 and k+1 has been shown to be true. 
 
@@ -15,7 +13,4 @@ Steps:
 
 A good example of this is an equation that does not factor nicely. If I know that $S_1$ is true, but I can't factor $S_2$ in a satisfactory way to prove that for each n+1 the statement is true, then proving a few until finding an instance of something factoring well can solve this issue. 
 
-Can be used to prove [[FundamentalTheoremOfArithmetic.md]].
-
-
-
+Can be used to prove [Fundamental Theorem Of Arithmetic](FundamentalTheoremOfArithmetic.md)
diff --git a/docs/Subspace.md b/docs/Subspace.md
@@ -20,4 +20,4 @@ To verify a subset U of a vector space V is a subspace of V we only need to veri
 
 **Definition:** A subspace is a lower dimensional space.
 
-Often we find that many higher dimensional points all reside in or near a similar lower dimensional subspace which is the basis for [[Projection.md]]
+Often we find that many higher dimensional points all reside in or near a similar lower dimensional subspace which is the basis for [Projection](Projection.md)
diff --git a/docs/SurfaceRepresentation.md b/docs/SurfaceRepresentation.md
@@ -6,4 +6,4 @@ CS 331 W11 L2
 
 **Definition:** Modelling the surface of a continuous object in a discrete computing environment.
 
-To do this we use a [[Mesh.md]]. 
+To do this we use a [Mesh](Mesh.md) 
diff --git a/docs/TargetEncoding.md b/docs/TargetEncoding.md
@@ -6,11 +6,11 @@ ML CH2
 
 **Definition:** Target encoding is the process of mapping some feature to a representative value that is calculated. 
 
-This is different than [[LabelEncoding.md]] as label encoding uses an arbitrary mapping instead of a representative one. 
+This is different than [LabelEncoding](LabelEncoding.md) as label encoding uses an arbitrary mapping instead of a representative one. 
 
 A simple way to do this would be to find the mean target value of a given feature label (group by) and then mapping the feature to this mean. This is simple, but is imperfect especially when there is not a lot of information for a specific label.
 
-Another way to do this is by using a weighted mean that takes into account the means of all other feature options as well. This is often done by finding the current option's mean, multiplying it by the number of occurrences of said option, then adding the overall mean multiplied by some [[Hyperparameter.md]] m. The final thing to do is to divide this value by the number of instances of this option added to m.
+Another way to do this is by using a weighted mean that takes into account the means of all other feature options as well. This is often done by finding the current option's mean, multiplying it by the number of occurrences of said option, then adding the overall mean multiplied by some [Hyperparameter](Hyperparameter.md) m. The final thing to do is to divide this value by the number of instances of this option added to m.
 
 Equation:
 
diff --git a/docs/Texture.md b/docs/Texture.md
@@ -6,8 +6,8 @@ CS 331 W11 Lecture 2
 
 **Definition:** The texture of an object is it's surface and how it looks.
 
-This is implemented in unity via the [[MeshRenderer.md]]
+This is implemented in unity via the [MeshRenderer](MeshRenderer.md)
 
-Game engines implement [[Baking.md]] to hardcode this texture at the cost of accuracy when changing perspective and lighting. 
+Game engines implement [Baking](Baking.md) to hardcode this texture at the cost of accuracy when changing perspective and lighting. 
 
-See [[TextureMaps.md]] for more information about object texture rendering.
+See [TextureMaps](TextureMaps.md) for more information about object texture rendering.
diff --git a/docs/TextureMaps.md b/docs/TextureMaps.md
@@ -4,6 +4,6 @@ CS 331 W11 / 2
 
 ## Notes:
 
-**Definition:** Texture maps are used to control the look of the [[Texture.md]] associated with an object. Texture maps attempt to simulate real world 3d surfaces without the cost of computing many meshes. 
+**Definition:** Texture maps are used to control the look of the [Texture](Texture.md) associated with an object. Texture maps attempt to simulate real world 3d surfaces without the cost of computing many meshes. 
 
 
diff --git a/docs/TotalProbabilityTheroem.md b/docs/TotalProbabilityTheorem.md
diff --git a/docs/Tractable.md b/docs/Tractable.md
@@ -6,4 +6,4 @@ U 2.3
 
 **Definition:** A tractable problem is a problem that can be solved in polynomial time (reasonable amount of time).
 
-See also [[Intractable.md]].
+See also [Intractable](Intractable.md).
diff --git a/docs/Transform.md b/docs/Transform.md
@@ -10,9 +10,9 @@ In Unity, we have a left handed coordinate system as oppossed to the standard ri
 
 Additionally, each game object has its own local coordinate system that moves with the object. In this way Z becomes forward, X becomes right, and y becomes up with respect to positive values on said axis. 
 
-The datatype used to represent position in 3d space is [[Vector3.md]]. Each x,y,z component is of datatype float. 
+The datatype used to represent position in 3d space is [Vector3](Vector3.md) Each x,y,z component is of datatype float. 
 
-See [[Quaternions.md]] for rotations.
+See [Quaternions](Quaternions.md) for rotations.
 
 Object is transform while the datatype is Transform. 
 
diff --git a/docs/Transformations.md b/docs/Transformations.md
@@ -6,4 +6,4 @@ Khan
 
 **Definition:** Transoformations are functions that take an input vector and output another vector.
 
-See [[LinearTransformation.md]] for a specific type.
+See [LinearTransformation](LinearTransformation.md) for a specific type.
diff --git a/docs/Translate.md b/docs/Translate.md
@@ -6,6 +6,6 @@ CS331 W12 L2
 
 This is a method of Unity's Transform class that moves the GameObject by the distance specified with respect to the local coordinate system. 
 
-See [[Rotate.md]] for similar function for rotating based on local rotation. 
+See [Rotate](Rotate.md) for similar function for rotating based on local rotation. 
 
-Also see [[LocalScale.md]] for changing the local scale. 
+Also see [LocalScale](LocalScale.md) for changing the local scale. 
diff --git a/docs/Tree.md b/docs/Tree.md
@@ -4,7 +4,7 @@ Abstract Math and CS202
 
 **Definition:** Trees are connected graphs without cycles. 
 
-There is no implication about split numbers or anything of the sort, but something interesting is that in all cases it must be true that the number of edges is one less than the number of vertices. This can be proved through [[StrongInduction.md]].
+There is no implication about split numbers or anything of the sort, but something interesting is that in all cases it must be true that the number of edges is one less than the number of vertices. This can be proved through [Strong Induction](StrongInduction.md)
 
 
 **Root:** This is a node that has no parents
@@ -17,8 +17,8 @@ There is no implication about split numbers or anything of the sort, but somethi
 
 **Subtree:** A subtree is a section of a tree that is based upon a new root node that cuts off everything above it.
 
-See [[LinkedLists.md]] as linked lists (when non-cyclic) are a form of tree.
+See [Linked Lists](LinkedLists.md) as linked lists (when non-cyclic) are a form of tree.
 
-Also see [[BinaryTree.md]] for a specific tree type.
+Also see [Binary Tree](BinaryTree.md) for a specific tree type.
 
 Note: A graph with 0 or 1 nodes are both trees because there is a connection between all nodes.
diff --git a/docs/Triangulation.md b/docs/Triangulation.md
@@ -6,4 +6,4 @@ CS 331 W11 L2
 
 **Definition:** To break a surface up into triangles.
 
-This is often used to create [[Mesh.md]] for [[SurfaceRepresentation.md]]. Triangulation represents a 3d object using 2d surfaces as the union of triangles. This is how we create "3d" objects. 
+This is often used to create [Mesh](Mesh.md) for [[SurfaceRepresentation.md]]. Triangulation represents a 3d object using 2d surfaces as the union of triangles. This is how we create "3d" objects. 
diff --git a/docs/TwosComplement.md b/docs/TwosComplement.md
@@ -14,4 +14,4 @@ As such we have 1 and -1 as follows:
 
 -1 : 11111111
 
-This solves the problem of having a negative zero and decreases the computational overhead of using [[OnesComplement.md]].
+This solves the problem of having a negative zero and decreases the computational overhead of using [Ones Complement](OnesComplement.md)
diff --git a/docs/Underfitting.md b/docs/Underfitting.md
@@ -6,4 +6,4 @@ ML CH1
 
 **Definition:** Using a model that is too simple to learn the underlying structure of data.
 
-See [[Overfitting.md]] for the inverse of this.
+See [Overfitting](Overfitting.md) for the inverse of this.
diff --git a/docs/Universe.md b/docs/Universe.md
@@ -10,4 +10,4 @@ Often we state the universe as the variable U.
 
 This is also sometimes called the domain, universe of discourse, or the domain of discourse.
 
-See also [[UniversalSet.md]] for the same concept. I created this note because the term bears remembering and I forgot what I called the universal set in the domain of stats and probability. 
+See also [UniversalSet](UniversalSet.md) for the same concept. I created this note because the term bears remembering and I forgot what I called the universal set in the domain of stats and probability. 
diff --git a/docs/UnstableGradients.md b/docs/UnstableGradients.md
@@ -6,7 +6,7 @@ ML 550
 
 **Definition:** Unstable gradients are the idea that different layers of a neural network can learn at widely different rates.
 
-This often manifests as [[ExplodingGradients.md]] or [[VanishingGradients.md]]
+This often manifests as [ExplodingGradients](ExplodingGradients.md) or [[VanishingGradients.md]]
 
 This was a reason that deep neural networks were mostly abandoned in the early 2000s until there were revisions to model architecture. It was found that the initialization scheme of a normal weight distribution about 0 with a std deviation of 1 and the use of sigmoid activation functions caused this issue. Mainly the sigmoid function as they backpropogate gradients that are generally very small.
 
diff --git a/docs/UnsupervisedLearning.md b/docs/UnsupervisedLearning.md
@@ -2,12 +2,10 @@
 
 ML L1
 
-
-
 **Definition:** Given a dataset with no labels, find some structure in the underlying data. 
 
-[[ClusteringAlgorithms.md]] are often created using unsupervised learning.
+[ClusteringAlgorithms](ClusteringAlgorithms.md) are often created using unsupervised learning.
 
 Another example of unsupervised learning is the cocktail party problem where you have multiple microphones in a room that is noisy, how do you separate out individual voices?
 
-See [[UnsupervisedPretraining.md]] for information about unsupervised training followed by supervised training.
+See [UnsupervisedPretraining](UnsupervisedPretraining.md) for information about unsupervised training followed by supervised training.
diff --git a/docs/UnsupervisedPretraining.md b/docs/UnsupervisedPretraining.md
@@ -8,4 +8,4 @@ ML P576
 
 This is often used because unlabeled data is often abundant, but labeled data is expensive.
 
-We can do this with GANs as well as [[Autoencoder.md]]. With autoencoders we train the autoencoder to compress the data and then reuse the lower layers of this autoencoder as the lower layers for a neural network. This is useful because autoencoders are good at finding representations of the data without the need for labeled data.
+We can do this with GANs as well as [Autoencoder](Autoencoder.md) With autoencoders we train the autoencoder to compress the data and then reuse the lower layers of this autoencoder as the lower layers for a neural network. This is useful because autoencoders are good at finding representations of the data without the need for labeled data.
diff --git a/docs/VanishingGradients.md b/docs/VanishingGradients.md
@@ -6,10 +6,10 @@ ML 550
 
 **Definition:** Vanishing gradients is a neural network problem where lower levels (earlier hidden layers) have such small gradients that gradient steps make tiny changes and the model never converges upon an a good solution.
 
-This is a very common problem as most of the time gradients get smaller and smaller. As such, this problem is much more common than [[ExplodingGradients.md]] which primarly happens with RNNs.
+This is a very common problem as most of the time gradients get smaller and smaller. As such, this problem is much more common than [ExplodingGradients](ExplodingGradients.md) which primarly happens with RNNs.
 
 ### Solutions
 
 Use ReLU and better weight initialization (not gaussian distribution with std deviation of 1).
 
-See [[UnstableGradients.md]] for more.
+See [UnstableGradients](UnstableGradients.md) for more.
diff --git a/docs/Variance.md b/docs/Variance.md
@@ -14,7 +14,7 @@ For this it is paramount to understand that the multiplication by the weight goe
 
 Shown above, find the difference between each value and the mean, square it to get a positive, and then sum the values. We then average it by multiplying by 1 over the cardinality of X.
 
-If we take the square root of the variance we then have the [[StandardDeviation.md]]
+If we take the square root of the variance we then have the [StandardDeviation](StandardDeviation.md)
 
 Additionally, the std deviation, given our definition of variance, is equal to sqrt(var(X)) given that the variance of the random variable X is squared.
 
diff --git a/docs/Vector3.md b/docs/Vector3.md
@@ -6,4 +6,4 @@ CS 331 W12 L3
 
 **Definition:** The Vector3 class in unity is used to represent x,y, and z coordinates in a singular object. This object stores each axis value as a float.
 
-See [[Movement.md]] for how to use Vector3s to move. 
+See [Movement](Movement.md) for how to use Vector3s to move. 
diff --git a/docs/VisualizationAlgorithm.md b/docs/VisualizationAlgorithm.md
@@ -4,4 +4,4 @@ ML Ch1
 
 
 
-**Definition:** Visualization algorithms are [[UnsupervisedLearning.md]] algorithms that output 2D or 3D representations of your data. 
+**Definition:** Visualization algorithms are [UnsupervisedLearning](UnsupervisedLearning.md) algorithms that output 2D or 3D representations of your data. 
diff --git a/docs/VonNeumannModel.md b/docs/VonNeumannModel.md
@@ -10,7 +10,7 @@ This is our broad model for computing and computer architecture. Additionally, t
 
 Sequential instruction processing is ensured using a program counter that states what is being processed currently. 
 
-Alternatives listed in [[ForwardThoughts.md]] 
+Alternatives listed in [ForwardThoughts](ForwardThoughts.md) 
 
 ---
 
diff --git a/docs/WellOrdered.md b/docs/WellOrdered.md
@@ -6,7 +6,7 @@ Abstract Math Chapter 10
 
 **Definition:** A well order set has a definite smallest element. 
 
-This is important because it is the basis for [[Induction.md]] as without it, there would be no way to prove that $S_n\implies S_{n+1}$ means that for something is true for all values in the set. 
+This is important because it is the basis for [Induction](Induction.md) as without it, there would be no way to prove that $S_n\implies S_{n+1}$ means that for something is true for all values in the set. 
 
 A few examples of well ordered sets are $\N$, any known subset or provable subset of $\N$, the set {0,2,4,5646}, and infinitely many others.

	notes Unnamed repository; edit this file 'description' to name the repository.
	Log \| Files \| Refs

M	docs/AbstractDataType.md	\|	2	+-
M	docs/Adam.md	\|	4	+---
M	docs/Algorithms.md	\|	2	+-
M	docs/AngleBetweenVectors.md	\|	2	+-
M	docs/Animation.md	\|	8	++++----
M	docs/AnimationController.md	\|	2	+-
M	docs/Armature.md	\|	2	+-
M	docs/Backpropagation.md	\|	2	+-
M	docs/BarrierSynchronization.md	\|	4	++--
R	docs/BayesTheroem.md -> docs/BayesTheorem.md	\|	0
M	docs/Bias.md	\|	2	+-
M	docs/Biconditional.md	\|	2	+-
M	docs/Bijective.md	\|	4	+---
M	docs/BitSteering.md	\|	4	++--
M	docs/Blender.md	\|	22	+++++++++-------------
M	docs/BreadthFirstSearch.md	\|	4	++--
M	docs/BulkSynchronousProcessing.md	\|	2	+-
M	docs/CNN.md	\|	2	+-
M	docs/Calculus.md	\|	20	++++++++++----------
R	docs/CentralLimitTheroem.md -> docs/CentralLimitTheorem.md	\|	0
M	docs/CircularDoublyLinkedList.md	\|	5	++---
M	docs/CircularLinkedList.md	\|	3	+--
M	docs/ClassificationProblem.md	\|	2	+-
M	docs/Clip.md	\|	2	+-
M	docs/Closure.md	\|	2	+-
M	docs/ClusteringAlgorithms.md	\|	6	++----
M	docs/Codomain.md	\|	2	+-
M	docs/Complement.md	\|	2	+-
A	docs/ConditionalProbabilityTheorem.md	\|	11	+++++++++++
D	docs/ConditionalProbabilityTheroem.md	\|	13	-------------
M	docs/ContinuousProbability.md	\|	2	+-
M	docs/Contrapositive.md	\|	2	+-
M	docs/Correlation.md	\|	2	+-
M	docs/CounterExample.md	\|	2	+-
M	docs/Covariance.md	\|	2	+-
M	docs/CramersRule.md	\|	2	+-
M	docs/Crosstabulation.md	\|	2	+-
M	docs/DRAM.md	\|	6	+++---
M	docs/DRAMBanks.md	\|	2	+-
M	docs/DRAMCell.md	\|	4	+---
M	docs/DRAMChips.md	\|	2	+-
M	docs/DRAMRefresh.md	\|	4	++--
M	docs/DRAMRowHammer.md	\|	2	+-
M	docs/DataFlow.md	\|	7	+++----
M	docs/Degree.md	\|	4	+---
M	docs/DemorgansLaw.md	\|	2	+-
M	docs/DepthFirstSearch.md	\|	6	++----
M	docs/Determinant.md	\|	2	+-
M	docs/DiscreteUniformLaw.md	\|	2	+-
M	docs/DisturbanceErrors.md	\|	6	++----
M	docs/ExplodingGradients.md	\|	4	++--
M	docs/ExponentialDistribution.md	\|	4	+---
M	docs/FeatureScaling.md	\|	2	+-
M	docs/ForwardThoughts.md	\|	8	+++-----
M	docs/Frequency.md	\|	2	+-
M	docs/FundamentalTheoremOfArithmetic.md	\|	4	+---
M	docs/GameLoop.md	\|	2	+-
M	docs/GameObject.md	\|	2	+-
M	docs/GaussianElimination.md	\|	4	++--
M	docs/GradientClipping.md	\|	2	+-
M	docs/GradientDescent.md	\|	8	++++----
M	docs/GradientDescentCode.md	\|	2	+-
M	docs/Graphs.md	\|	4	+---
M	docs/HarmonicMean.md	\|	4	++--
M	docs/Homogeneous.md	\|	4	+---
M	docs/Hyperparameter.md	\|	2	+-
M	docs/Hyperplane.md	\|	4	+---
M	docs/Hypervolume.md	\|	2	+-
M	docs/ISA.md	\|	8	++++----
M	docs/Image.md	\|	2	+-
M	docs/Incremental.md	\|	4	+---
M	docs/Induction.md	\|	8	++++----
M	docs/Inertia.md	\|	2	+-
M	docs/Inference.md	\|	2	+-
M	docs/Inhomogeneous.md	\|	4	+---
M	docs/Instruction.md	\|	4	++--
M	docs/InverseTransformation.md	\|	4	++--
M	docs/KMeans.md	\|	4	+---
M	docs/Kernel.md	\|	2	+-
M	docs/KeyframeAnimation.md	\|	2	+-
M	docs/LabelEncoding.md	\|	6	++----
M	docs/LawOfLargeNumbers.md	\|	4	+---
M	docs/LearningRate.md	\|	4	++--
M	docs/LinearIndependence.md	\|	2	+-
M	docs/LinearRegression.md	\|	4	++--
M	docs/LinkedLists.md	\|	10	++++------
M	docs/LocalScale.md	\|	2	+-
M	docs/LogisticRegression.md	\|	2	+-
M	docs/LoopInvariant.md	\|	2	+-
M	docs/MAE.md	\|	2	+-
M	docs/MarkovChains.md	\|	4	++--
M	docs/MarkovProcess.md	\|	4	+---
M	docs/MatrixMultiplication.md	\|	2	+-
M	docs/Memory.md	\|	14	+++++++-------
M	docs/MemoryManagement.md	\|	2	+-
M	docs/MergeSort.md	\|	2	+-
M	docs/Mesh.md	\|	4	++--
M	docs/MicroArchitecture.md	\|	6	+++---
M	docs/MinMaxScaling.md	\|	2	+-
M	docs/MixedRandomVariable.md	\|	4	++--
M	docs/ModelBasedLearning.md	\|	2	+-
M	docs/MonoBehaviour.md	\|	2	+-
M	docs/Movement.md	\|	6	+++---
M	docs/MultilabelClassification.md	\|	2	--
M	docs/NaryOperations.md	\|	2	--
M	docs/NoveltyDetection.md	\|	2	+-
M	docs/Nullity.md	\|	4	+---
M	docs/OfflineLearning.md	\|	2	+-
M	docs/OneHotEncoding.md	\|	2	+-
M	docs/OneVersusAll.md	\|	2	+-
M	docs/OneVersusOne.md	\|	2	+-
M	docs/OnlineLearning.md	\|	4	++--
M	docs/Opcode.md	\|	4	++--
M	docs/Operands.md	\|	4	++--
M	docs/Optimizer.md	\|	8	++++----
M	docs/OrdinaryLeastSquares.md	\|	2	+-
M	docs/OrthogonalComplement.md	\|	2	+-
M	docs/OutOfOrderExecution.md	\|	2	+-
M	docs/Overfitting.md	\|	2	+-
R	docs/Oversmooothing.md -> docs/Oversmoothing.md	\|	0
M	docs/PartialDerivative.md	\|	2	+-
M	docs/Partition.md	\|	2	+-
M	docs/Pipelining.md	\|	2	+-
M	docs/PlaneToPlaneDistance.md	\|	4	++--
M	docs/PoissonProcess.md	\|	4	++--
M	docs/Pole.md	\|	2	+-
M	docs/Postcondition.md	\|	2	+-
M	docs/Prediction.md	\|	2	+-
M	docs/Preimage.md	\|	2	+-
M	docs/PretrainedModels.md	\|	2	+-
M	docs/Probability.md	\|	4	++--
M	docs/ProbabilityDensityFunctions.md	\|	2	+-
M	docs/ProbabilityLaw.md	\|	4	+---
M	docs/ProbabilityMassFunction.md	\|	4	++--
M	docs/ProgrammerVisibleState.md	\|	4	++--
M	docs/Quaternions.md	\|	2	+-
M	docs/Queue.md	\|	4	++--
M	docs/RMSE.md	\|	2	+-
M	docs/RandomExperiment.md	\|	2	+-
M	docs/RandomForest.md	\|	4	++--
M	docs/RandomSubspaces.md	\|	2	+-
M	docs/RandomVariables.md	\|	2	+-
M	docs/RegressionProblem.md	\|	6	+++---
M	docs/RelativeFrequency.md	\|	4	++--
M	docs/Rotate.md	\|	4	++--
M	docs/Rotation.md	\|	2	+-
M	docs/RowBuffer.md	\|	2	+-
M	docs/RuleLearning.md	\|	2	+-
M	docs/RuleOfSarrus.md	\|	2	+-
M	docs/Scheduling.md	\|	2	+-
M	docs/SelfSupervisedLearning.md	\|	2	+-
M	docs/SinglyLinkedList.md	\|	4	++--
M	docs/SkeletalAnimation.md	\|	6	+++---
M	docs/SmallestCounterExample.md	\|	2	+-
M	docs/Stack.md	\|	4	++--
M	docs/StandardDeviation.md	\|	2	+-
M	docs/Standardization.md	\|	6	+++---
M	docs/StatisticsAndProbability.md	\|	217	++++++++++++++++++++++++++++++++++++++++++++++---------------------------------
M	docs/StochasticAlgorithm.md	\|	2	+-
M	docs/StrongInduction.md	\|	9	++-------
M	docs/Subspace.md	\|	2	+-
M	docs/SurfaceRepresentation.md	\|	2	+-
M	docs/TargetEncoding.md	\|	4	++--
M	docs/Texture.md	\|	6	+++---
M	docs/TextureMaps.md	\|	2	+-
R	docs/TotalProbabilityTheroem.md -> docs/TotalProbabilityTheorem.md	\|	0
M	docs/Tractable.md	\|	2	+-
M	docs/Transform.md	\|	4	++--
M	docs/Transformations.md	\|	2	+-
M	docs/Translate.md	\|	4	++--
M	docs/Tree.md	\|	6	+++---
M	docs/Triangulation.md	\|	2	+-
M	docs/TwosComplement.md	\|	2	+-
M	docs/Underfitting.md	\|	2	+-
M	docs/Universe.md	\|	2	+-
M	docs/UnstableGradients.md	\|	2	+-
M	docs/UnsupervisedLearning.md	\|	6	++----
M	docs/UnsupervisedPretraining.md	\|	2	+-
M	docs/VanishingGradients.md	\|	4	++--
M	docs/Variance.md	\|	2	+-
M	docs/Vector3.md	\|	2	+-
M	docs/VisualizationAlgorithm.md	\|	2	+-
M	docs/VonNeumannModel.md	\|	2	+-
M	docs/WellOrdered.md	\|	2	+-