Ran formatter to remove double blank lines - notes

commit 831d2237b792e972b6e61862ca7a74bf727593e2
parent 1888c35238bfd791588eaf4faefc7ff79db3a021
Author: Andrew Laack <andrew@laack.co>
Date:   Sun, 24 May 2026 16:23:49 -0500

Ran formatter to remove double blank lines

Diffstat:
M docs/AES.md  | 1 -
M docs/AISafety.md  | 2 --
M docs/AbstractDataType.md  | 2 --
M docs/Abstraction.md  | 2 --
M docs/Accuracy.md  | 2 --
M docs/ActiveAttacks.md  | 1 -
M docs/AdaBoost.md  | 2 --
M docs/AdaGrad.md  | 2 --
M docs/AdjacencyMatrix.md  | 2 --
M docs/Algorithm.md  | 2 --
M docs/Algorithms.md  | 1 -
M docs/AmbientSpace.md  | 2 --
M docs/Amortization.md  | 2 --
M docs/AngleBetweenVectors.md  | 2 --
M docs/Animation.md  | 2 --
M docs/AnimationController.md  | 3 ---
M docs/AnomalyDetection.md  | 2 --
M docs/Antisymmetric.md  | 2 --
M docs/Arccos.md  | 2 --
M docs/Arcsin.md  | 2 --
M docs/ArithmeticComputations.md  | 2 --
M docs/Armature.md  | 3 ---
M docs/Ascii.md  | 2 --
M docs/Asset.md  | 3 ---
M docs/AstronomicalUnit.md  | 2 --
M docs/AsymptoticNotation.md  | 3 ---
M docs/Authentication.md  | 2 --
M docs/Autoencoder.md  | 2 --
M docs/Availability.md  | 2 --
M docs/BCD.md  | 3 ---
M docs/Backpropagation.md  | 2 --
M docs/Bagging.md  | 2 --
M docs/Baking.md  | 2 --
M docs/Bandits.md  | 2 --
M docs/BarrierSynchronization.md  | 3 ---
M docs/BasisOfSubspace.md  | 2 --
M docs/BatchNormalization.md  | 2 --
M docs/BayesianInference.md  | 2 --
M docs/BekensteinBound.md  | 2 --
M docs/BellmanEquation.md  | 2 --
M docs/BernoulliProcess.md  | 2 --
M docs/BernoulliRandomVariable.md  | 2 --
M docs/Bias.md  | 3 ---
M docs/Biconditional.md  | 2 --
M docs/BigONotation.md  | 2 --
M docs/BigThetaNotation.md  | 2 --
M docs/BijectiveProof.md  | 3 ---
M docs/BinaryCode.md  | 2 --
M docs/BinaryOperations.md  | 2 --
M docs/BinaryTree.md  | 2 --
M docs/Binomial.md  | 2 --
M docs/BinomialCoefficient.md  | 2 --
M docs/BinomialDistribution.md  | 2 --
M docs/Bipartite.md  | 2 --
M docs/BitSteering.md  | 2 --
M docs/BlenderShortcuts.md  | 3 ---
M docs/Boxplots.md  | 2 --
M docs/BreadthFirstSearch.md  | 2 --
M docs/BucketAddressing.md  | 2 --
M docs/CART.md  | 2 --
M docs/CNN.md  | 2 --
M docs/CaesarCipher.md  | 2 --
M docs/Calculus.md  | 2 --
M docs/CanaryValue.md  | 2 --
M docs/CartesianProduct.md  | 2 --
M docs/Cases.md  | 2 --
M docs/CategoricalCrossEntropy.md  | 2 --
M docs/Ceiling.md  | 2 --
M docs/CentralLimitTheorem.md  | 2 --
M docs/ChainRule.md  | 2 --
M docs/Chaining.md  | 2 --
M docs/ChangeOfBasis.md  | 2 --
M docs/CharacteristicEquation.md  | 2 --
M docs/CharacteristicRoots.md  | 2 --
M docs/Clip.md  | 2 --
M docs/Closure.md  | 2 --
M docs/Codeword.md  | 2 --
M docs/Codomain.md  | 2 --
M docs/Collection.md  | 2 --
M docs/Collision.md  | 2 --
M docs/ColumnSpace.md  | 2 --
M docs/Combination.md  | 2 --
M docs/CombinatorialProof.md  | 2 --
M docs/Combinatorics.md  | 2 --
M docs/Commutative.md  | 2 --
M docs/Complement.md  | 2 --
M docs/ComplexVectorSpace.md  | 2 --
M docs/CompositeNumber.md  | 2 --
M docs/ComputationalGraph.md  | 2 --
M docs/ConditionalProbability.md  | 2 --
M docs/Confidentiality.md  | 2 --
M docs/ConfusionMatrix.md  | 2 --
M docs/Congruence.md  | 2 --
M docs/CongruenceClass.md  | 2 --
M docs/Connected.md  | 2 --
M docs/ConnectedComponent.md  | 2 --
M docs/Connectives.md  | 2 --
M docs/Contingency.md  | 2 --
M docs/ContinuousProbability.md  | 2 --
M docs/Contradiction.md  | 2 --
M docs/Contrapositive.md  | 2 --
M docs/Converse.md  | 2 --
M docs/Coordinate.md  | 2 --
M docs/Correlation.md  | 2 --
M docs/CorrelationCoefficient.md  | 2 --
M docs/CosineSimilarity.md  | 1 -
M docs/CountSort.md  | 2 --
M docs/CounterExample.md  | 2 --
M docs/CountingPrinciple.md  | 2 --
M docs/Covariance.md  | 2 --
M docs/CramersRule.md  | 2 --
M docs/CreditAssignmentProblem.md  | 2 --
M docs/CrossProduct.md  | 2 --
M docs/CrossValidation.md  | 2 --
M docs/Crosstabulation.md  | 3 ---
M docs/CumulativeDensityFunction.md  | 2 --
M docs/Cycle.md  | 2 --
M docs/DBSCAN.md  | 2 --
M docs/DRAM.md  | 3 ---
M docs/DRAMBanks.md  | 2 --
M docs/DRAMChips.md  | 2 --
M docs/DRAMRefresh.md  | 2 --
M docs/DRAMRowHammer.md  | 2 --
M docs/DataAugmentation.md  | 2 --
M docs/DataStructureAugmentation.md  | 2 --
M docs/DecisionThreshold.md  | 2 --
M docs/DecisionTrees.md  | 6 ------
M docs/DemorgansLaw.md  | 2 --
M docs/DensityEstimation.md  | 2 --
M docs/DerivedDistribution.md  | 2 --
M docs/DesignPoint.md  | 2 --
M docs/Determinant.md  | 3 ---
M docs/DiagonalMatrices.md  | 2 --
M docs/Digraph.md  | 2 --
M docs/DimensionalityReduction.md  | 2 --
M docs/Dimensions.md  | 2 --
M docs/DirectProof.md  | 2 --
M docs/DirectSum.md  | 2 --
M docs/DiscountFactor.md  | 2 --
M docs/DiscreteProbability.md  | 2 --
M docs/DiscreteRandomVariable.md  | 2 --
M docs/DiscreteUniformLaw.md  | 2 --
M docs/DisjointSet.md  | 2 --
M docs/DistanceCalculation.md  | 2 --
M docs/DistanceToPlane.md  | 2 --
M docs/Distinguishable.md  | 2 --
M docs/DistinguishablePermutation.md  | 2 --
M docs/Distributive.md  | 2 --
M docs/DistributiveLaw.md  | 2 --
M docs/Div.md  | 4 ----
M docs/DivideAndConquer.md  | 2 --
M docs/DivisionAlgorithm.md  | 3 ---
M docs/DivisionRule.md  | 2 --
M docs/DivisionRules.md  | 2 --
M docs/DotProduct.md  | 3 ---
M docs/DoublyLinkedList.md  | 2 --
M docs/Dropout.md  | 2 --
M docs/Duality.md  | 2 --
M docs/DynamicProgramming.md  | 2 --
M docs/EarlyStopping.md  | 2 --
M docs/EigenVector.md  | 2 --
M docs/ElasticNetRegression.md  | 2 --
M docs/ElementaryTransformations.md  | 2 --
M docs/EligibilityTraces.md  | 2 --
M docs/EmptyGraph.md  | 2 --
M docs/Ensembles.md  | 2 --
M docs/Entropy.md  | 2 --
M docs/Episode.md  | 2 --
M docs/Episodic.md  | 2 --
M docs/EuclideanAlgorithm.md  | 2 --
M docs/EulersTheorem.md  | 2 --
M docs/Evaluation.md  | 2 --
M docs/Event.md  | 2 --
M docs/EvolutionaryMethods.md  | 2 --
M docs/ExhaustiveProof.md  | 2 --
M docs/Expectation.md  | 2 --
M docs/ExplodingGradients.md  | 2 --
M docs/Exploit.md  | 2 --
M docs/ExploratoryDataAnalysis.md  | 2 --
M docs/Explore.md  | 2 --
M docs/ExtraTrees.md  | 2 --
M docs/FactorsOfVariation.md  | 2 --
M docs/Feature.md  | 2 --
M docs/FeatureScaling.md  | 2 --
M docs/FermatsTheorem.md  | 2 --
M docs/FibonacciNumbers.md  | 3 ---
M docs/FiniteDimensional.md  | 2 --
M docs/Floor.md  | 2 --
M docs/Folding.md  | 2 --
M docs/FreeVariables.md  | 2 --
M docs/Frequency.md  | 2 --
M docs/FrequencyHeuristic.md  | 2 --
M docs/FrobeniusNorm.md  | 2 --
M docs/FunctionCompositionOperator.md  | 1 -
M docs/FunctionNotation.md  | 2 --
M docs/FundamentalOperations.md  | 2 --
M docs/FundamentalTheoremOfArithmetic.md  | 1 -
M docs/FundamentalTheroemofCalculus.md  | 2 --
M docs/GCD.md  | 2 --
M docs/GameLoop.md  | 2 --
M docs/GameObject.md  | 2 --
M docs/GaussianElimination.md  | 2 --
M docs/GaussianIntegers.md  | 2 --
M docs/GaussianMixtureModels.md  | 2 --
M docs/GeneralSolution.md  | 2 --
M docs/GeneralizationError.md  | 2 --
M docs/GeneralizedPigeonholePrinciple.md  | 2 --
M docs/GradientBoosting.md  | 2 --
M docs/GradientClipping.md  | 2 --
M docs/GradientDescent.md  | 2 --
M docs/GradientDescentCode.md  | 6 ------
M docs/GramSchmidtProcess.md  | 2 --
M docs/HadamardProduct.md  | 2 --
M docs/HalfWord.md  | 2 --
M docs/HarmonicMean.md  | 4 ----
M docs/HasseDiagram.md  | 2 --
M docs/HistogramBasedGradientBoosting.md  | 2 --
M docs/HistoricalDesigns.md  | 2 --
M docs/Hyperparameter.md  | 2 --
M docs/Hypervolume.md  | 2 --
M docs/IPD.md  | 2 --
M docs/IQR.md  | 2 --
M docs/ISA.md  | 2 --
M docs/IdentityMatrix.md  | 2 --
M docs/Image.md  | 2 --
M docs/ImitationLearning.md  | 2 --
M docs/Imputation.md  | 2 --
M docs/IncrementalMean.md  | 3 ---
M docs/Independence.md  | 2 --
M docs/IndependentEvents.md  | 2 --
M docs/Indistinguishable.md  | 2 --
M docs/Individuals.md  | 2 --
M docs/Induction.md  | 2 --
M docs/Inertia.md  | 2 --
M docs/Inference.md  | 2 --
M docs/InformationContent.md  | 2 --
M docs/InformationSecurity.md  | 2 --
M docs/Injective.md  | 2 --
M docs/Input.md  | 2 --
M docs/InsertionSort.md  | 3 ---
M docs/InstanceBasedLearning.md  | 2 --
M docs/Instruction.md  | 3 ---
M docs/IntegerOverflow.md  | 2 --
M docs/Integrity.md  | 2 --
M docs/IntelligenceExplosion.md  | 2 --
M docs/Intractable.md  | 2 --
M docs/Invariance.md  | 2 --
M docs/Inverse.md  | 2 --
M docs/InverseFunction.md  | 2 --
M docs/InverseMatrix.md  | 2 --
M docs/InverseTransformation.md  | 2 --
M docs/Invertible.md  | 3 ---
M docs/IteratedExpectations.md  | 2 --
M docs/Jerk.md  | 2 --
M docs/JointDensityFunction.md  | 2 --
M docs/JointProbability.md  | 2 --
M docs/KNearestNeighbor.md  | 2 --
M docs/Kernel.md  | 2 --
M docs/Key.md  | 2 --
M docs/KeyframeAnimation.md  | 2 --
M docs/Keyless.md  | 2 --
M docs/KnowledgeBaseApproach.md  | 2 --
M docs/L1Norm.md  | 2 --
M docs/L2Norm.md  | 2 --
M docs/LCM.md  | 2 --
M docs/LLE.md  | 2 --
M docs/LUDecomposition.md  | 2 --
M docs/LamportSignature.md  | 1 -
M docs/Language.md  | 2 --
M docs/LassoRegression.md  | 2 --
M docs/LawOfCosines.md  | 2 --
M docs/LawOfDetachment.md  | 2 --
M docs/LeakyReLU.md  | 2 --
M docs/LearningRate.md  | 3 ---
M docs/LexicographicOrdering.md  | 2 --
M docs/Lighting.md  | 2 --
M docs/LinearCombination.md  | 2 --
M docs/LinearCongruence.md  | 2 --
M docs/LinearEquations.md  | 2 --
M docs/LinearHomogeneousRecurrenceRelation.md  | 2 --
M docs/LinearIndependence.md  | 4 ----
M docs/LinearMaps.md  | 2 --
M docs/LinearProbing.md  | 2 --
M docs/LinearRegression.md  | 2 --
M docs/LinearSubspace.md  | 2 --
M docs/LinearTransformation.md  | 2 --
M docs/Linearithmic.md  | 2 --
M docs/LoadFactor.md  | 2 --
M docs/LocalScale.md  | 2 --
M docs/LogarithmicDifferentiation.md  | 2 --
M docs/Loop.md  | 2 --
M docs/LoopInvariant.md  | 2 --
M docs/LossFunction.md  | 2 --
M docs/Lvalue.md  | 2 --
M docs/MAE.md  | 2 --
M docs/MCTS.md  | 2 --
M docs/MLP.md  | 2 --
M docs/MUX.md  | 2 --
M docs/ManifoldLearning.md  | 2 --
M docs/MarginalProbabilities.md  | 2 --
M docs/MarkovAssumption.md  | 2 --
M docs/MarkovChains.md  | 1 -
M docs/MarkovDecisionProcesses.md  | 2 --
M docs/MarkovInequality.md  | 2 --
M docs/MarkovRewardProcess.md  | 2 --
M docs/MathConceptsCS331.md  | 2 --
M docs/Matrix.md  | 3 ---
M docs/MaxNorm.md  | 2 --
M docs/MaxNormRegularization.md  | 2 --
M docs/MaxPooling.md  | 2 --
M docs/Memory.md  | 2 --
M docs/MemoryManagement.md  | 4 ----
M docs/MergeSort.md  | 2 --
M docs/MersennePrime.md  | 2 --
M docs/Mesh.md  | 2 --
M docs/MeshFilter.md  | 2 --
M docs/MeshRenderer.md  | 2 --
M docs/MicroArchitecture.md  | 2 --
M docs/Microcontroller.md  | 2 --
M docs/Microprocessor.md  | 2 --
M docs/MillerRabinAlgorithm.md  | 2 --
M docs/MinMaxScaling.md  | 2 --
M docs/MinusOneTrick.md  | 2 --
M docs/MixedGraph.md  | 2 --
M docs/Mod.md  | 2 --
M docs/Model.md  | 2 --
M docs/ModelBasedLearning.md  | 2 --
M docs/ModelFree.md  | 2 --
M docs/Momentum.md  | 2 --
M docs/MonoBehaviour.md  | 2 --
M docs/MonotonicFunction.md  | 2 --
M docs/MonteCarloLearning.md  | 2 --
M docs/MonteCarloMethod.md  | 2 --
M docs/MooresLaw.md  | 2 --
M docs/MosaicPlot.md  | 2 --
M docs/Movement.md  | 3 ---
M docs/MultiValuedFunction.md  | 2 --
M docs/Multigraph.md  | 2 --
M docs/MultinomialCoefficient.md  | 2 --
M docs/MultioutputClassification.md  | 2 --
M docs/Multiset.md  | 2 --
M docs/MutuallyIndependent.md  | 2 --
M docs/NAG.md  | 2 --
M docs/NLP.md  | 2 --
M docs/NPComplete.md  | 2 --
M docs/NPProblem.md  | 2 --
M docs/NaiveBayes.md  | 2 --
M docs/NaturalLog.md  | 3 ---
M docs/Negation.md  | 2 --
M docs/NestedQuantifier.md  | 2 --
M docs/NetworkSecurity.md  | 2 --
M docs/NeuralNetworks.md  | 2 --
M docs/NonDeterministicFiniteAutomata.md  | 1 -
M docs/NonRepudation.md  | 2 --
M docs/Norm.md  | 2 --
M docs/NormalDistribution.md  | 2 --
M docs/NormalVector.md  | 2 --
M docs/NoveltyDetection.md  | 2 --
M docs/NullSpace.md  | 2 --
M docs/NumberTheory.md  | 2 --
M docs/OSI.md  | 2 --
M docs/OffPolicyLearning.md  | 2 --
M docs/OneHotEncoding.md  | 2 --
M docs/OneVersusAll.md  | 2 --
M docs/OneVersusOne.md  | 2 --
M docs/OnesComplement.md  | 2 --
M docs/OnlineLearning.md  | 2 --
M docs/Opcode.md  | 2 --
M docs/OpenAddressing.md  | 2 --
M docs/Operands.md  | 2 --
M docs/OperatorNotation.md  | 2 --
M docs/OptimalBayesianAgent.md  | 2 --
M docs/OptimalSubstructure.md  | 2 --
M docs/Optimizer.md  | 2 --
M docs/OracleComputer.md  | 2 --
M docs/OrderedSample.md  | 2 --
M docs/OrdinaryLeastSquares.md  | 2 --
M docs/OrthogonalComplement.md  | 2 --
M docs/Orthonormal.md  | 2 --
M docs/OutOfBag.md  | 2 --
M docs/OutOfOrderExecution.md  | 2 --
M docs/Overfitting.md  | 2 --
M docs/OverlappingSubproblems.md  | 2 --
M docs/Oversmoothing.md  | 2 --
M docs/PCA.md  | 2 --
M docs/PairwiseIndependence.md  | 2 --
M docs/PairwiseRelativelyPrime.md  | 2 --
M docs/PartialDerivative.md  | 2 --
M docs/PartiallyApplied.md  | 1 -
M docs/PartiallyObservableMarkovDecisionProcess.md  | 2 --
M docs/ParticularSolution.md  | 2 --
M docs/Partition.md  | 2 --
M docs/PascalsIdentity.md  | 2 --
M docs/PassiveAttacks.md  | 2 --
M docs/Pasting.md  | 2 --
M docs/Path.md  | 2 --
M docs/Percentile.md  | 2 --
M docs/Perceptrons.md  | 2 --
M docs/PerfectNumbers.md  | 2 --
M docs/PeriodicChain.md  | 2 --
M docs/PerlinNoise.md  | 2 --
M docs/Permutation.md  | 2 --
M docs/PermutationMatrix.md  | 2 --
M docs/Pictograph.md  | 2 --
M docs/PigeonholePrinciple.md  | 2 --
M docs/Pipelining.md  | 2 --
M docs/PlaneToPlaneDistance.md  | 2 --
M docs/PoissonDistribution.md  | 2 --
M docs/PolarCoordinates.md  | 3 ---
M docs/Policy.md  | 2 --
M docs/PoolingLayers.md  | 2 --
M docs/Postcondition.md  | 2 --
M docs/PowerSet.md  | 2 --
M docs/Precision.md  | 2 --
M docs/Preconditions.md  | 2 --
M docs/Predicate.md  | 2 --
M docs/Prediction.md  | 2 --
M docs/Preimage.md  | 2 --
M docs/PretrainedModels.md  | 2 --
M docs/PrimeFactorization.md  | 2 --
M docs/PrimeNumber.md  | 2 --
M docs/PrincipleOfInclusionExclusion.md  | 2 --
M docs/Probability.md  | 2 --
M docs/ProbabilityDensityFunctions.md  | 2 --
M docs/ProbabilityMassFunction.md  | 2 --
M docs/ProductRule.md  | 2 --
M docs/Prognosticator.md  | 2 --
M docs/ProgrammerVisibleState.md  | 2 --
M docs/PropertyBasedTesting.md  | 1 -
M docs/Proposition.md  | 2 --
M docs/PropositionalFunction.md  | 2 --
M docs/ProveSetEquality.md  | 2 --
M docs/PseudoGraphs.md  | 2 --
M docs/QuadraticProbing.md  | 3 ---
M docs/Quantifiers.md  | 2 --
M docs/Quantile.md  | 2 --
M docs/Quaternions.md  | 2 --
M docs/Queue.md  | 2 --
M docs/RCombination.md  | 2 --
M docs/RMSE.md  | 2 --
M docs/ROC.md  | 2 --
M docs/RPermutation.md  | 2 --
M docs/RadialBasisFunction.md  | 2 --
M docs/RamseyNumbers.md  | 2 --
M docs/RandomExperiment.md  | 2 --
M docs/RandomForest.md  | 2 --
M docs/RandomPatches.md  | 2 --
M docs/RandomProjection.md  | 2 --
M docs/RandomSubspaces.md  | 2 --
M docs/RandomVariables.md  | 2 --
M docs/Range.md  | 2 --
M docs/Rank.md  | 2 --
M docs/RealVectorSpace.md  | 2 --
M docs/RecencyHeuristic.md  | 2 --
M docs/RecurrenceRelation.md  | 2 --
M docs/ReducedRowEchelonForm.md  | 3 ---
M docs/Reflexive.md  | 2 --
M docs/ReflexiveClosure.md  | 2 --
M docs/RegressionProblem.md  | 2 --
M docs/RegressionToTheMean.md  | 2 --
M docs/Relation.md  | 2 --
M docs/RelationOnASet.md  | 2 --
M docs/RelativeFrequency.md  | 2 --
M docs/RelativelyPrime.md  | 2 --
M docs/Representative.md  | 2 --
M docs/Return.md  | 2 --
M docs/RewardSignal.md  | 2 --
M docs/RidgeRegression.md  | 2 --
M docs/RightHandRule.md  | 2 --
M docs/Rotate.md  | 2 --
M docs/Rotation.md  | 2 --
M docs/RowBuffer.md  | 2 --
M docs/RowEchelonForm.md  | 2 --
M docs/RuleLearning.md  | 3 ---
M docs/RuleOfSarrus.md  | 3 ---
M docs/Rvalue.md  | 2 --
M docs/SMOTE.md  | 2 --
M docs/SVM.md  | 3 ---
M docs/SampleSpace.md  | 2 --
M docs/Satisfiable.md  | 2 --
M docs/Script.md  | 2 --
M docs/Segmentation.md  | 2 --
M docs/SelfSupervisedLearning.md  | 2 --
M docs/SemiSupervisedLearning.md  | 2 --
M docs/SentinelValue.md  | 2 --
M docs/Sequence.md  | 2 --
M docs/Set.md  | 2 --
M docs/SetFunction.md  | 2 --
M docs/SharedPointers.md  | 3 ---
M docs/Shear.md  | 2 --
M docs/SignedExtension.md  | 3 ---
M docs/SimilarityFeature.md  | 2 --
M docs/SimpsonsParadox.md  | 2 --
M docs/SingleKey.md  | 2 --
M docs/SinglyLinkedList.md  | 3 ---
M docs/SkeletalAnimation.md  | 2 --
M docs/SmallestCounterExample.md  | 3 ---
M docs/Span.md  | 2 --
M docs/Stack.md  | 2 --
M docs/Stacking.md  | 2 --
M docs/StandardDeviation.md  | 2 --
M docs/StandardMatrix.md  | 2 --
M docs/Standardization.md  | 2 --
M docs/StateAnalysis.md  | 2 --
M docs/StatisticalInference.md  | 2 --
M docs/StemAndLeafPlot.md  | 2 --
M docs/StirlingsFormula.md  | 2 --
M docs/StochasticAlgorithm.md  | 2 --
M docs/StratifiedSampling.md  | 2 --
M docs/String.md  | 2 --
M docs/StrongAI.md  | 2 --
M docs/StrongInduction.md  | 2 --
M docs/Subgraph.md  | 2 --
M docs/Subsequence.md  | 2 --
M docs/Subset.md  | 2 --
M docs/Subspace.md  | 2 --
M docs/SubtractionRule.md  | 2 --
M docs/SumOfGeometricSeries.md  | 2 --
M docs/SumOfVectorSpaces.md  | 2 --
M docs/SumRule.md  | 2 --
M docs/SuperScalar.md  | 2 --
M docs/SupervisedLearning.md  | 2 --
M docs/SupportVectorMachine.md  | 2 --
M docs/SurfaceRepresentation.md  | 2 --
M docs/Surjective.md  | 2 --
M docs/SymmetricClosure.md  | 2 --
M docs/SymmetricMatrix.md  | 2 --
M docs/SystemsOfEquations.md  | 2 --
M docs/TF-IDF.md  | 1 -
M docs/TargetEncoding.md  | 1 -
M docs/Task.md  | 2 --
M docs/Tautology.md  | 2 --
M docs/Tensor.md  | 2 --
M docs/Texture.md  | 2 --
M docs/TextureMaps.md  | 1 -
M docs/TheRightTimeToLearn.md  | 1 -
M docs/TimeComplexity.md  | 2 --
M docs/Tractable.md  | 2 --
M docs/TransferLearning.md  | 2 --
M docs/Transformations.md  | 2 --
M docs/Transitive.md  | 2 --
M docs/TransitiveClosure.md  | 2 --
M docs/Translate.md  | 2 --
M docs/Transpose.md  | 1 -
M docs/Tree.md  | 1 -
M docs/TreeDiagram.md  | 2 --
M docs/Trichotomy.md  | 2 --
M docs/TripleProductExpansion.md  | 2 --
M docs/TruePositiveRate.md  | 2 --
M docs/Trust.md  | 2 --
M docs/TruthSet.md  | 2 --
M docs/TwoKey.md  | 2 --
M docs/TwosComplement.md  | 2 --
M docs/UVMaps.md  | 2 --
M docs/UnaryOperations.md  | 2 --
M docs/Underfitting.md  | 2 --
M docs/Undersmoothing.md  | 2 --
M docs/Unicode.md  | 2 --
M docs/UniquePointers.md  | 2 --
M docs/UnitVector.md  | 2 --
M docs/Unity.md  | 2 --
M docs/UniversalSet.md  | 2 --
M docs/Universe.md  | 2 --
M docs/Unsolvable.md  | 2 --
M docs/UnstableGradients.md  | 2 --
M docs/UnsupervisedPretraining.md  | 2 --
M docs/VacuousProof.md  | 2 --
M docs/ValueFunction.md  | 2 --
M docs/VandermondesIdentity.md  | 2 --
M docs/VanishingGradients.md  | 2 --
M docs/Variables.md  | 2 --
M docs/VariadicOperations.md  | 2 --
M docs/Vector.md  | 3 ---
M docs/Vector3.md  | 2 --
M docs/VectorMatrixMultipication.md  | 4 ----
M docs/VectorSpace.md  | 2 --
M docs/Vertex.md  | 2 --
M docs/VigenereCipher.md  | 2 --
M docs/VisualizationAlgorithm.md  | 2 --
M docs/VonNeumannModel.md  | 2 --
M docs/VotingClassifiers.md  | 2 --
M docs/WeakAI.md  | 2 --
M docs/Weight.md  | 2 --
M docs/WeightedGraph.md  | 2 --
M docs/WellOrdered.md  | 2 --
M docs/WideAndDeepNN.md  | 2 --
M docs/Word.md  | 2 --
M docs/ZeroExtension.md  | 2 --
M docs/ZeroOneMatrix.md  | 2 --
M docs/rsync.md  | 2 --
M docs/usubstitution.md  | 2 --

591 files changed, 0 insertions(+), 1217 deletions(-)
diff --git a/docs/AES.md b/docs/AES.md
@@ -10,4 +10,3 @@
 
 Decryption is done by reversing the order of encryption and thus I will only take notes on the encryption portion of AES.
 
-
diff --git a/docs/AISafety.md b/docs/AISafety.md
@@ -13,8 +13,6 @@ How to define AGI?
 
 How to test for AGI?
 
-
-
 #### Superintelligence - Nick Bostrom
 
 Ch. 1
diff --git a/docs/AbstractDataType.md b/docs/AbstractDataType.md
@@ -2,8 +2,6 @@
 
 CS 202 L14
 
-
-
 **Definition:** An ADT is a datatype that specifies it's interfaces but not implementation. This is similar to the relationship between an [ISA](ISA.md) and [MicroArchitecture](MicroArchitecture.md)
 
 These are a focus of CS 303 and include things such as [Stack](Stack.md) and [Queue](Queue.md).
diff --git a/docs/Abstraction.md b/docs/Abstraction.md
@@ -2,8 +2,6 @@
 
 Abstraction cpu architecture L1
 
-
-
 Abstraction hides away the implementation details to higher levels. You only see the interfaces provided to you. 
 
 There are instances where exposing lower level functions to higher can be useful. This can be seen when lower level instructions are shown to the compiler to allow better optimization. 
diff --git a/docs/Accuracy.md b/docs/Accuracy.md
@@ -2,8 +2,6 @@
 
 ML D2
 
-
-
 **Definition:** Accuracy in machine learning describes the overall correctness of a model. 
 
 This metric is the percentage of guesses that are accurate based on predictions and labels.
diff --git a/docs/ActiveAttacks.md b/docs/ActiveAttacks.md
@@ -6,7 +6,6 @@
 
 **Definition:** Active attacks are attacks that are attacks that manipulate data streams.
 
-
 ## Four Categories
 
 1. Masquerade - Pretending to be someone else
diff --git a/docs/AdaBoost.md b/docs/AdaBoost.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** Adaboost is a boosting algorithm that boosts training instances that the prior model underfit (missed). 
 
 In adaboosting each predictor gets a model weight based on how accurate it is generally then each instance weight is also updated based on the accuracy of the models prediction. When models are wrong more often their weight is lowered but when instances are wrong their weight is increased to incentivize future models to fix the issue.
diff --git a/docs/AdaGrad.md b/docs/AdaGrad.md
@@ -2,8 +2,6 @@
 
 ML P584
 
-
-
 **Definition:** Adaptively adjusts learning rate based on historical gradients.
 
 I don't understand this very well.
diff --git a/docs/AdjacencyMatrix.md b/docs/AdjacencyMatrix.md
@@ -2,8 +2,6 @@
 
 Ch 4
 
-
-
 **Definition:** An adjacency matrix is a matrix where each column represents a node as do the rows. In each position there is either a true or false denoting whether or not there is an edge between the two nodes.
 
 These matricies are symmetric about the main diagonal and the diagonal is all false as a node may not be connected to itself.
diff --git a/docs/Algorithm.md b/docs/Algorithm.md
@@ -2,8 +2,6 @@
 
 Computer Architecture L2
 
-
-
 **Definition:** A step by step procedure to solve a problem where each step is definite (quantifiable), computable, and finite (ends eventually).
 
 This was described in computer architecture as despite algorithms being implemented in software, there are also algorithms involved in computer architecture. 
diff --git a/docs/Algorithms.md b/docs/Algorithms.md
@@ -149,7 +149,6 @@ L6:
 - [Trichotomy](Trichotomy.md)
 - [Monotonic Function](MonotonicFunction.md)
 
-
 #### Other algorithms adjacent stuff
 
 - [Bekenstein Bound](BekensteinBound.md)
diff --git a/docs/AmbientSpace.md b/docs/AmbientSpace.md
@@ -2,8 +2,6 @@
 
 Khan U2
 
-
-
 **Definition:** The ambient space is the space surrounding some object.
 
 When describing a cube the ambient space would be R^3. When discussing a hyperplane with four dimensions the ambient space would be R^5.
diff --git a/docs/Amortization.md b/docs/Amortization.md
@@ -2,6 +2,4 @@
 
 L2
 
-
-
 **Definition:** Amortization is the process of averaging out more complex actions across many events even if the smaller events are not actually doing anything related to the complex action.
diff --git a/docs/AngleBetweenVectors.md b/docs/AngleBetweenVectors.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** The angle between two vectors is the angle between the two vectors when their tails are positioned at the zero vector. 
 
 ## Calculation
diff --git a/docs/Animation.md b/docs/Animation.md
@@ -2,8 +2,6 @@
 
 CG W13 L3
 
-
-
 **Definition:** Animation is the process of making still images appear as continuous movement.
 
 Unity uses [Clip](Clip.md) for simple animations. These are pre-defined and repetitive in nature. Think of repeated falling animations and not rigidbody falling animations which would not be predefined. 
diff --git a/docs/AnimationController.md b/docs/AnimationController.md
@@ -2,11 +2,8 @@
 
 CG W13 L3
 
-
-
 **Definition:** An animation controller is a finite state machine that can be represented as a graph where the verticies are states and the edges are transitions between states. Note that this is a directed graph.
 
-
 See [Animation](Animation.md) for individual animation class.
 
 This is an observer architecture where it observes things and calls secondary actions namely animation classes.
diff --git a/docs/AnomalyDetection.md b/docs/AnomalyDetection.md
@@ -2,8 +2,6 @@
 
 ML CH1
 
-
-
 **Definition:** Anomaly detection is the task of detecting anomalous samples. 
 
 A common example of this is unusual credit card transactions used to prevent fraud.
diff --git a/docs/Antisymmetric.md b/docs/Antisymmetric.md
@@ -2,6 +2,4 @@
 
 Ch 9.1
 
-
-
 **Definition:** An antisymmetric relation is one such that if xRy then yRx is false where x != y.
diff --git a/docs/Arccos.md b/docs/Arccos.md
@@ -2,6 +2,4 @@
 
 SS
 
-
-
 **Definition:** Arccos is the inverse of cosine. 
diff --git a/docs/Arcsin.md b/docs/Arcsin.md
@@ -2,6 +2,4 @@
 
 SS
 
-
-
 **Definition:** Arcsin is the inverse of sine. 
diff --git a/docs/ArithmeticComputations.md b/docs/ArithmeticComputations.md
@@ -2,6 +2,4 @@
 
 Ch 5
 
-
-
 **Definition:** Arithmetic computations, with respect to hashing, are computations that use arithmetic operators to go from some key (or portion of a key) to a hash value (or portion).
diff --git a/docs/Armature.md b/docs/Armature.md
@@ -1,8 +1,5 @@
 # Armature
 
-
-
-
 **Definition:** An armature is a set of bones with parent child relationships. This set can be disjoint where not all bones can be traversed do by moving from parents to children or vice versa.
 
 An armature also has a default pose (generally t pose), which is the state of all it's bones when imported (transform). 
diff --git a/docs/Ascii.md b/docs/Ascii.md
@@ -2,6 +2,4 @@
 
 W2
 
-
-
 **Definition:** Ascii is another character encoding scheme that uses only 1 byte per character.
diff --git a/docs/Asset.md b/docs/Asset.md
@@ -2,8 +2,5 @@
 
 CS 331 W12 L3
 
-
-
 **Definition:** Assets are all resources in untiy. 
 
-
diff --git a/docs/AstronomicalUnit.md b/docs/AstronomicalUnit.md
@@ -2,8 +2,6 @@
 
 CM L1
 
-
-
 **Definition:** An astronomical unit is a measure of distance defined as the mean distance between the earth and the sun.
 
 Distance in Meters:
diff --git a/docs/AsymptoticNotation.md b/docs/AsymptoticNotation.md
@@ -2,8 +2,6 @@
 
 L1 MIT
 
-
-
 **Definition:** Asymptotic notation describes the running time of an algorithm.
 
 #### Types of complexity notation
@@ -21,7 +19,6 @@ There are three different notations for this big O, beg Theta, and big Omega.
 
 Note: When describing loose bounds (2n = o(n^2)) we use lowercase letters such as little o. This implies that we are describing an upper bound that is guaranteed to be larger than the growth rate that is not tight to the upper bound of the algorithm like how big O would be.
 
-
 #### Common complexities
 
 O(1) - Constant
diff --git a/docs/Authentication.md b/docs/Authentication.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1.4
 
-
-
 **Definition:** Authentication is a service to ensure communication is authentic. 
 
 The difference between authentication and authorization is authentication ensures you are who you say you are and authorization ensures you are able to do what you are trying to do.
diff --git a/docs/Autoencoder.md b/docs/Autoencoder.md
@@ -2,8 +2,6 @@
 
 ML General
 
-
-
 **Definition:** An autoencoder is an unsupervised neural network that takes inputs, compresses them into a smaller representation while trying to maintain as much information as possible, and then reconstructs the compressed representation into a new full representation.
 
 The idea of an autoencoder is for the model to learn the best way to extract features out of a large input (many features) so it can then be passed to another model that will require less features and subsequently be faster to train and use. 
diff --git a/docs/Availability.md b/docs/Availability.md
@@ -4,6 +4,4 @@
 
 **Chapter:** 1.1
 
-
-
 **Definition:** Availability ensures systems work promptly and service is not denied to authorized users.
diff --git a/docs/BCD.md b/docs/BCD.md
@@ -1,8 +1,6 @@
 
 CA L3
 
-
-
 **Definition:** Binary coded decimal (BCD) is the process of encoding a decimal where each digit is a fixed number of bits.
 
 Ex. 
@@ -13,4 +11,3 @@ After: 0001 0000 : 0011 0111 : 0100 1001
 
 As you can see above, each digit is encode in a nibble.
 
-
diff --git a/docs/Backpropagation.md b/docs/Backpropagation.md
@@ -2,8 +2,6 @@
 
 ML D6
 
-
-
 **Definition:** Backpropagation is the combination of reverse-mode autodiff and gradient descent to iteratively improve models based on expected outputs by given inputs by following the gradient for each [Weight](Weight.md) and [Bias](Bias.md).
 
 When using backpropogation we use many mini-batches. Generally we go through the entire dataset to train multiple times and these passes are called epochs. When using mini-batches we first find the values from the input layer for each input, then we go to the second layer, and so on until reaching the output layer. This is the forward pass stage. An important note is that all intermediate values must be preserved to ensure we can do the backward pass.
diff --git a/docs/Bagging.md b/docs/Bagging.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** Bagging is the process of training the same model multiple times with a different subset of the data. Bagging is different than pasting as bagging does not take samples that are selected as part of the random sample for training out of the options to add to the random sample. This means one model (predictor) can be trained with multiple instances of the same sample.
 
 One reason bagging and pasting are good is that they both allow for parallel processing because multiple models do predictions concurrently. The same is also true for model training.
diff --git a/docs/Baking.md b/docs/Baking.md
@@ -2,8 +2,6 @@
 
 CS 331 W11 Lecture 2
 
-
-
 **Definition:**  The process of precomputing. Another term for this is statically computed (not dynamically computed ie realtime).
 
 There are two different types of precomputing we can implement:
diff --git a/docs/Bandits.md b/docs/Bandits.md
@@ -2,8 +2,6 @@
 
 L1
 
-
-
 **Definition:** Bandits are a class of problems in RL where an agent repeatedly chooses from a set of actions which give a reward drawn from an unknown probability distribution.
 
 Basically, there are a set of actions, you do one, you have a reward... that's all
diff --git a/docs/BarrierSynchronization.md b/docs/BarrierSynchronization.md
@@ -1,11 +1,8 @@
 
 Computer Architecture L2
 
-
-
 **Definition:** This is a way to block all execution until all inputs are ready. This can be thought of as thread syncing and is closely related to [Data Flow](DataFlow.md) execution.
 
-
 in1  in2  in3
 
 BLOCKERBLOCKER
diff --git a/docs/BasisOfSubspace.md b/docs/BasisOfSubspace.md
@@ -2,6 +2,4 @@
 
 Khan
 
-
-
 **Definition:** The basis of a subspace is list of vectors V := (v_1, v_2, ..., v_m) such that V spans the subspace and is linearly independent.
diff --git a/docs/BatchNormalization.md b/docs/BatchNormalization.md
@@ -2,8 +2,6 @@
 
 ML P569
 
-
-
 **Definition:** Batch normalization is the process of adding layers to a neural network that perform normalization upon inputs and output the normalized values.
 
 This helps with unstable gradient issues and removes the need to normalize inputs for the network. On the flip side, these computations are bad for TPUs and are generally slow. They also don't work with RNNs.
diff --git a/docs/BayesianInference.md b/docs/BayesianInference.md
@@ -2,8 +2,6 @@
 
 Stats D5
 
-
-
 **Definition:** Bayesian inference is the principal that p(something) can often be described based on prior inferences that may make p(something) more or less likely thus factoring them into the probability.
 
 This is basically using state to update probability values.
diff --git a/docs/BekensteinBound.md b/docs/BekensteinBound.md
@@ -2,8 +2,6 @@
 
 SS
 
-
-
 **Definition:** The Bekenstein bound gives the most amount of energy that can be contained in a sphere prior to it becoming a blackhole.
 
 This has implications for computation as there is a theoretical cap for which any computation device that exceeds this would instantly become a blackhole. 
diff --git a/docs/BellmanEquation.md b/docs/BellmanEquation.md
@@ -2,8 +2,6 @@
 
 L2
 
-
-
 **Definition:** The Bellman equation is an equation that states the value of the optimal choice right now is the value of the next choice + the value of the current choice.
 
 This is intuitive and simple to understand, but it is the basis for our ability to do dynamic programming because without it there is no optimal substructure.
diff --git a/docs/BernoulliProcess.md b/docs/BernoulliProcess.md
@@ -2,8 +2,6 @@
 
 Prob L13
 
-
-
 **Definition:** A Bernoulli process is a sequence of binary trials (random variables).
 
 As such, sample space are all possible sets of outcomes confined to a certain number of trials.
diff --git a/docs/BernoulliRandomVariable.md b/docs/BernoulliRandomVariable.md
@@ -2,8 +2,6 @@
 
 Prob L8
 
-
-
 **Definition:** A bernoulli random variable is a random variable that has a bernoulli distribution where the outcome is binary. 
 
 In a bernoulli distribution the probability of any given event x is defined as p and the probability of not x is defined as 1-p. 
diff --git a/docs/Bias.md b/docs/Bias.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 ### Stats
 
 **Definition:** Bias is a generalization error caused by incorrect assumptions such as assuming data is linear when it is not.
@@ -12,7 +10,6 @@ High bias models are likely to underfit training data.
 
 See also [Variance](Variance.md)
 
-
 ### ANNs
 
 **Definition:** Biases in ANNs are constants used as additional inputs for each perceptron (neuron). This can be thought of like y-intercepts for linear equations.
diff --git a/docs/Biconditional.md b/docs/Biconditional.md
@@ -2,8 +2,6 @@
 
 1.1.2
 
-
-
 **Definition:** The biconditional is the [Connectives](Connectives.md) that states the antecedent and consequent have the same truth values.
 
 $p \iff q$ this can be stated as p iff q, if and only if p then q, or some other way.
diff --git a/docs/BigONotation.md b/docs/BigONotation.md
@@ -2,8 +2,6 @@
 
 Ch 2
 
-
-
 **Definition:** Big O Notation is a system agnostic way to describe worst case runtime for an algorithm. With Big O Notation we formally state f(x) = O(g(x)) for some c and N such that f(n) <= c(g(x)) for all x >= N. 
 
 Basically, there must be some constant multiple and some starting point such that the growth rate of the function f(x) does not ever surpass g(x). Note that the equality is a bit contentious as O(g(x)) describes a family of functions with coefficients c.
diff --git a/docs/BigThetaNotation.md b/docs/BigThetaNotation.md
@@ -2,6 +2,4 @@
 
 CS 303 Ch 2
 
-
-
 **Definition:** We use big theta notation to state that an algorithm has exactly the same asymptotic complexity as some other algorithm. This means big theta of f is equivalent to big theta of g where each of them will (almost always) have a unique value for c (constant multiplier) and a unique value for N (where N <=x).
diff --git a/docs/BijectiveProof.md b/docs/BijectiveProof.md
@@ -2,11 +2,8 @@
 
 Ch 6.3
 
-
-
 **Definition:** A bijective proof is a proof where we prove the compared sets can be represented as a bijective function and thus have the same cardinality.
 
-
 Example:
 
 Prove $\binom{n}{k} = \binom{n}{n-k}$
diff --git a/docs/BinaryCode.md b/docs/BinaryCode.md
@@ -2,8 +2,6 @@
 
 Ch 6
 
-
-
 **Definition:** A binary code for S is a function c from S -> {0,1} * .
 
 Basically, this is the function to encode elements of S to binary.
diff --git a/docs/BinaryOperations.md b/docs/BinaryOperations.md
@@ -2,8 +2,6 @@
 
 SS
 
-
-
 **Definition:** Binary operations are operations that take two inputs.
 
 Some examples include assignment (left,right side), addition, subtraction
diff --git a/docs/BinaryTree.md b/docs/BinaryTree.md
@@ -2,8 +2,6 @@
 
 CS202 L14
 
-
-
 **Definition:** For any node n, all elements in the left subtree are less than the current node and everything in the right subtree is greater than the current node. 
 
 For a generic binary tree, there is no necessitation that the left a right trees are in any way balanced.
diff --git a/docs/Binomial.md b/docs/Binomial.md
@@ -2,8 +2,6 @@
 
 Ch 1.3
 
-
-
 **Definition:** A binomial is the combination of two values in the form of (x + y).
 
 Examples:
diff --git a/docs/BinomialCoefficient.md b/docs/BinomialCoefficient.md
@@ -2,8 +2,6 @@
 
 L4
 
-
-
 **Definition:** A binomial coefficient is represented by two numbers and has a singular evaluation. The evaluation describes the number of unique subsets of the length denoted by the bottom value that can be created given a set of the length denoted by the top value.
 
 The reason it is called the binomial coefficient is because it can be used in the expansion of binomials (ie. (x+y)^5). To use it in this case we multiply the applicable coefficient with the number of ways to select that number of a coefficient. This idea is also described as the binomial theorem.
diff --git a/docs/BinomialDistribution.md b/docs/BinomialDistribution.md
@@ -2,8 +2,6 @@
 
 Stats D1
 
-
-
 **Definition:** A binomial distribution is a distribution such that each point is the probability of some true or false condition.
 
 This can be thought of as a medical experiment. The x-axis would be some marker and the y axis would be the probability of curing some disease... As an example.
diff --git a/docs/Bipartite.md b/docs/Bipartite.md
@@ -2,8 +2,6 @@
 
 Ch 4
 
-
-
 **Definition:** A bipartite graph is a graph that can be divided into two sets where every edge connects a vertex in one set to the other set, but never the same set.
 
 Think about a graph with red and blue where blue can only connect to red and vice versa.
diff --git a/docs/BitSteering.md b/docs/BitSteering.md
@@ -2,8 +2,6 @@
 
 CA L3
 
-
-
 **Definition:** This is a bit in an instruction that determines how later bits are interpreted. 
 
 A good example of this is an [Opcode](Opcode.md).
diff --git a/docs/BlenderShortcuts.md b/docs/BlenderShortcuts.md
@@ -2,8 +2,6 @@
 
 Shortcuts from lectures
 
-
-
 "Z" - Switch between solid and wireframe (useful to select everything from a mesh from all sides)
 
 "1" - Vertex Mode
@@ -36,4 +34,3 @@ If you select "clipping" then it won't create the interior face. To do this drag
 
 This does not work for armatures. Instead, you need to select the bones to mirror, right click, selecte autoname, right click again, select symmeterize. This will mirror over x axis. If you don't autoname it won't work, also if you don't have it mirroring over the x-axis, it will also not work. As such, to resolve the rotation issue, click "r" and then rotate to ensure it should be mirrored over the x-axis
 
-
diff --git a/docs/Boxplots.md b/docs/Boxplots.md
@@ -2,8 +2,6 @@
 
 Stats D4
 
-
-
 **Definition:** A boxplot is a plot that shows the distribution of quartiles.
 
 These plots show the IQR (interquartile range - Q2 and Q3) filled in and then lines out to the Q1 and Q4 points. These also have dots for some outliers. This is also known as a box and whisker plot.
diff --git a/docs/BreadthFirstSearch.md b/docs/BreadthFirstSearch.md
@@ -2,8 +2,6 @@
 
 CS 202 L14
 
-
-
 **Definition:** Search algorithm that moves its way outward from the root node. This is different than [DepthFirstSearch](DepthFirstSearch.md) as it does not go all the way down and then search but instead moves away from the root.
 
 This uses a [Queue](Queue.md) to search.
diff --git a/docs/BucketAddressing.md b/docs/BucketAddressing.md
@@ -2,6 +2,4 @@
 
 Ch 5
 
-
-
 **Definition:** Bucket addressing is the process of using a finitely sized collection to store objects that collide.
diff --git a/docs/CART.md b/docs/CART.md
@@ -2,8 +2,6 @@
 
 ML D4
 
-
-
 **Definition:** The CART algorithm is used to train decision trees and works by splitting a training set into two parts using a single feature k where k is the feature that produces the purest subsets weighted by size. This is then repeated at each step (greedy) until reaching either a max depth, or until reaching some depth whereby it can not find a split that will reduce impurity.
 
 Note that this algorithm is greedy so there may be better lines that could be drawn if it took a suboptimal line at a given point in time, but that would increase the computing cost drastically.
diff --git a/docs/CNN.md b/docs/CNN.md
@@ -2,8 +2,6 @@
 
 ML SS
 
-
-
 **Definition:** A convolutional neural network is a neural network that has convolutional layers that perform filtering functions upon the input data.
 
 A convolution is the process of moving a filter across some data and calculating the current values based on the surrounding values multiplied by the values in the filter and then summing them for the final result. 
diff --git a/docs/CaesarCipher.md b/docs/CaesarCipher.md
@@ -2,6 +2,4 @@
 
 U 2.4
 
-
-
 **Definition:** A Caesar Cipher is monoalphabetic substitution whereby we encode characters as numbers shift the numbers by a constant amount and then decode them.
diff --git a/docs/Calculus.md b/docs/Calculus.md
@@ -1,6 +1,5 @@
 # Calculus (Links)
 
-
 ## Main Links
 
 Calc 2 (Leonard):
@@ -33,7 +32,6 @@ Section 2.8:
 
 ## Known Integrals
 
-
 Trig Integrations:
 
 ---
diff --git a/docs/CanaryValue.md b/docs/CanaryValue.md
@@ -2,8 +2,6 @@
 
 CS202 SelfStudy
 
-
-
 **Definition:** A canary value is used to detect buffer overflows by placing dummy data to be validated at some future time to ensure buffer overflows do not occur.
 
 When doing this, we create dummy data in a sequential piece of memory and then at some future time validate the data stored there to ensure buffer overflows are not occuring as they would change this data.
diff --git a/docs/CartesianProduct.md b/docs/CartesianProduct.md
@@ -2,8 +2,6 @@
 
 Throughout textbook
 
-
-
 **Definition:** The Cartesian Product of two sets is the set of all ordered pairs a,b where a is contianed in A and b is contained in B. 
 
 This set has a size of |A| * |B|.
diff --git a/docs/Cases.md b/docs/Cases.md
@@ -2,6 +2,4 @@
 
 U 1.8.1
 
-
-
 **Definition:** Proof by cases is a form of proof whereby we show every specific type of case is true.
diff --git a/docs/CategoricalCrossEntropy.md b/docs/CategoricalCrossEntropy.md
@@ -2,8 +2,6 @@
 
 ML D6
 
-
-
 **Definition:** Categorical cross entropy is a loss calculation used for classification algorithms.
 
 Categorical cross entropy is calculated by summing the log of y_i log(p_i) and multiplying by -1 where y_i is the expected classification (1 is true 0 false) and p_i is the probability output of the model.
diff --git a/docs/Ceiling.md b/docs/Ceiling.md
@@ -2,8 +2,6 @@
 
 U2.3.4
 
-
-
 **Definition:** The ceiling function specifies to round up the input to the nearest integer. 
 
 Remember to still round to the higher number for negatives.
diff --git a/docs/CentralLimitTheorem.md b/docs/CentralLimitTheorem.md
@@ -2,6 +2,4 @@
 
 L20
 
-
-
 **Definition:** The CLT states that as the number of trials increases distributions tend towards a normal distribution. 
diff --git a/docs/ChainRule.md b/docs/ChainRule.md
@@ -2,6 +2,4 @@
 
 Leonard
 
-
-
 **Definition:** The chain rule is a derivation rule used when we have a function within another function. The rule states $\frac{d}{dx} (g(f(x))) = g'(f(x)) \cdot f'(x)$.
diff --git a/docs/Chaining.md b/docs/Chaining.md
@@ -2,6 +2,4 @@
 
 Ch 5
 
-
-
 **Definition:** Chaining is the process of using a linked list to resolve collisions that result from duplicate hashcodes.
diff --git a/docs/ChangeOfBasis.md b/docs/ChangeOfBasis.md
@@ -2,8 +2,6 @@
 
 Khan U3
 
-
-
 **Definition:** Change of basis in linear algebra is the process of assuming the basis vectors to be some arbitrary linearly independent vectors.
 
 Example:
diff --git a/docs/CharacteristicEquation.md b/docs/CharacteristicEquation.md
@@ -2,8 +2,6 @@
 
 Ch 8.2
 
-
-
 **Definition:** A characteristic equation is an equation for a linear homogeneous recurrence relation that uses a_n = r^n to substitute into the equation.
 
 Original:
diff --git a/docs/CharacteristicRoots.md b/docs/CharacteristicRoots.md
@@ -2,6 +2,4 @@
 
 Ch 8.2
 
-
-
 **Definition:** A characteristic root in discrete math are values that satisfy a [Characteristic Equation](CharacteristicEquation.md).
diff --git a/docs/Clip.md b/docs/Clip.md
@@ -2,8 +2,6 @@
 
 CG W13 L3
 
-
-
 **Definition:** Prerecorded set of frames representing an object in motion.
 
 This can be thought of in a similar way to tiling where the start and end should be the same and then repeated over and over again.
diff --git a/docs/Closure.md b/docs/Closure.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** Closure means that performing some arbitrary operation (pick one, but not necessarily all) on any member of a set will result in another element of a set. 
 
 In the context of subspaces, we have closure under scalar multiplication and vector addition because these operations on any element of the [LinearSubspace](LinearSubspace.md) set results in another element of the set (by definition).
diff --git a/docs/Codeword.md b/docs/Codeword.md
@@ -2,8 +2,6 @@
 
 Ch 6
 
-
-
 **Definition:** A codeword is an element c(x) where c is a binary code and x is a message.
 
 Remember, a binary code is defined as c : S -> {0,1}\*.
diff --git a/docs/Codomain.md b/docs/Codomain.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** The codomain of a function is a set that contains all possible mappings from the domain of inputs to outputs. This set can also contain values that are not mapped to from the domain by the function.
 
 See [Range](Range.md) for only the subset of the codomain that is mapped to.
diff --git a/docs/Collection.md b/docs/Collection.md
@@ -2,6 +2,4 @@
 
 Ch 0
 
-
-
 **Definition:** Collection datatypes are datatypes that can, theoretically, store an arbitrarily large number of elements.
diff --git a/docs/Collision.md b/docs/Collision.md
@@ -2,6 +2,4 @@
 
 Ch 5
 
-
-
 **Definition:** A collision, with respect to hash tables, is when we try to place an element into a position in the array that is already taken. 
diff --git a/docs/ColumnSpace.md b/docs/ColumnSpace.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** The column space of a matrix the space that contains all combinations of the columns.
 
 In the case of a 3x2 (3 rows, 2 columns) matrix this, generally, is a plane. I say generally as if the two vectors are on the same line then they would be simply a line and not a plane.
diff --git a/docs/Combination.md b/docs/Combination.md
@@ -2,8 +2,6 @@
 
 TB 6.3
 
-
-
 **Definition:** A combination is a unique selection of elements from a given set. 
 
 The difference between a combination and a permutation is rearrangements of combinations are still considered the same whereas the opposite is true for permutations.
diff --git a/docs/CombinatorialProof.md b/docs/CombinatorialProof.md
@@ -2,8 +2,6 @@
 
 Ch 6.3
 
-
-
 **Definition:** A combinatorial proof is a proof that shows we are counting the same set and thus they are equivalent.
 
 Example:
diff --git a/docs/Combinatorics.md b/docs/Combinatorics.md
@@ -2,8 +2,6 @@
 
 Ch 6.1
 
-
-
 **Definition:** Combinatorics is the study of counting.
 
 Combinatorics is commonly used for enumeration in probability theory and sometimes computer science.
diff --git a/docs/Commutative.md b/docs/Commutative.md
@@ -2,8 +2,6 @@
 
 1.3.2
 
-
-
 **Definition:** The commutative property states the order by which the objects are placed does not effect the outcome of said operation. 
 
 a x b = b x a
diff --git a/docs/Complement.md b/docs/Complement.md
@@ -2,8 +2,6 @@
 
 L1
 
-
-
 **Definition:** The complement of a set is the set of all elements not in the original set, but in the consideration space (often sample space).
 
 There are technically two types of complements the absolute and relative complements. Generally we are talking about the relative complement which is the set defined as the difference between the superset and the subset. The absolute complement uses the U set ([UniversalSet](UniversalSet.md)) as the superset. 
diff --git a/docs/ComplexVectorSpace.md b/docs/ComplexVectorSpace.md
@@ -2,6 +2,4 @@
 
 Ch 1
 
-
-
 **Definition:** A complex vector space is a vector space on the complex numbers (C).
diff --git a/docs/CompositeNumber.md b/docs/CompositeNumber.md
@@ -2,6 +2,4 @@
 
 U 2.4
 
-
-
 **Definition:** A composite number is a number that is not prime and thus is composed of two or more prime numbers. 
diff --git a/docs/ComputationalGraph.md b/docs/ComputationalGraph.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1
 
-
-
 **Definition:** A computational graph in machine learning is a graph that shows every computation required to go from input to output.
 
 This is in contrast with probabilistic graphs which describe higher level actions that are performed.
diff --git a/docs/ConditionalProbability.md b/docs/ConditionalProbability.md
@@ -2,8 +2,6 @@
 
 Ch 1.4
 
-
-
 **Definition:** Conditional probability is the probability of a given event assuming another event has already occurred.
 
 P(A|B) = Probability of the event A given the event B occurred.
diff --git a/docs/Confidentiality.md b/docs/Confidentiality.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1.1
 
-
-
 **Definition:** Confidentiality ensures confidential information is not available to unauthorized individuals and that individuals have control over what information about them may be collected, stored, and disclosed to whom.
 
 As described above; the two main parts to this are **Data Confidentiality** and **Privacy**.
diff --git a/docs/ConfusionMatrix.md b/docs/ConfusionMatrix.md
@@ -2,6 +2,4 @@
 
 ML CH3
 
-
-
 **Definition:** A confusion matrix is a matrix that describes the number of confused sample predictions a model has broken down by both the actual and predicted values.
diff --git a/docs/Congruence.md b/docs/Congruence.md
@@ -2,6 +2,4 @@
 
 U 2.4
 
-
-
 **Definition:** Congruence describes the relationship between two numbers such that $a \equiv b (mod c)$.
diff --git a/docs/CongruenceClass.md b/docs/CongruenceClass.md
@@ -2,6 +2,4 @@
 
 U 2.4
 
-
-
 **Definition:** A congruence class is the set of all integers such that $a \equiv b (modc)$ for all integers a.
diff --git a/docs/Connected.md b/docs/Connected.md
@@ -2,6 +2,4 @@
 
 Ch 4
 
-
-
 **Definition:** Connected, in graph theory, means that there is a way to get from any node to any other node in the graph.
diff --git a/docs/ConnectedComponent.md b/docs/ConnectedComponent.md
@@ -2,6 +2,4 @@
 
 Ch 4
 
-
-
 **Definition:** A connected component is a subgraph in which each component of the subgraph is conected.
diff --git a/docs/Connectives.md b/docs/Connectives.md
@@ -2,8 +2,6 @@
 
 1.1.1
 
-
-
 **Definition:** Connectives are necessary for the creation of compound propositions and they are the following:
 
 - Negation (not | $\neg$)
diff --git a/docs/Contingency.md b/docs/Contingency.md
@@ -2,8 +2,6 @@
 
 1.3.1
 
-
-
 **Definition:** A contingency is a proposition that is neither always true nor always false. 
 
 An example of a contingency is simply $p$.
diff --git a/docs/ContinuousProbability.md b/docs/ContinuousProbability.md
@@ -2,8 +2,6 @@
 
 Stats Ch1
 
-
-
 **Definition:** A continuous probability is one where there are an uncountable number of outcomes. 
 
 This is often defined by intervals either finite or infinite.
diff --git a/docs/Contradiction.md b/docs/Contradiction.md
@@ -2,8 +2,6 @@
 
 Throughout textbook
 
-
-
 **Definition:** Contradiction is used to prove if then statements. This is done by assuming the then is true and the if is not true which would imply the statement is false. From here, you then show this causes a contradiction thus if the if is true then the then is true. 
 
 A contradiction is a proposition that is always false such as $p \wedge \neg p$.
diff --git a/docs/Contrapositive.md b/docs/Contrapositive.md
@@ -2,8 +2,6 @@
 
 Throughout TB - U1.7.2 Discrete TB
 
-
-
 **Definition:** To prove an if then statement with contrapositive we assume the then statement is false. Following from here we then prove the if part must also be true for the then to be false. So it follows that if the first is true then the second is also true because the second is never true when the first is false. 
 
 This is of the form $\neg q \to \neg p$ where we switch the statements and negate both. To just negate both we [Inverse](Inverse.md) it.
diff --git a/docs/Converse.md b/docs/Converse.md
@@ -2,8 +2,6 @@
 
 1.1.2
 
-
-
 **Definition:** The converse of a statement is to switch both sides of an implication statement.
 
 $p \to q$ converse is $q \to p$.
diff --git a/docs/Coordinate.md b/docs/Coordinate.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1
 
-
-
 **Definition:** A coordinate is a singular component of a vector or list.
 
 Consider v = (1, 4, 5). The third component of v is 5, the second component of v is 4, and the first component of v is 1.
diff --git a/docs/Correlation.md b/docs/Correlation.md
@@ -2,8 +2,6 @@
 
 Stats D2
 
-
-
 **Definition:** Correlation is the strength and direction relationship between two variables. This value is bounded between -1 and 1 where 0 is no correlation, 1 is pure positive linear relationship, and -1 is a pure negative linear relationship.
 
 See [CorrelationCoefficient](CorrelationCoefficient.md) for an applied example.
diff --git a/docs/CorrelationCoefficient.md b/docs/CorrelationCoefficient.md
@@ -2,8 +2,6 @@
 
 ML CH2
 
-
-
 **Definition:** The correlation coefficient is a floating point number that represents the strength of a linear relationship between two variables x and y. 
 
 The highest value is 1 and the lowest is -1. 1 and -1 mean there is either a proportional or inverse relationship between the two variables. 
diff --git a/docs/CosineSimilarity.md b/docs/CosineSimilarity.md
@@ -42,7 +42,6 @@ def cosine_similarity(A,B):
     b_l = magnitude(B)
     return dp_AB / (a_l * b_l)
 
-
 if __name__ == "__main__":
     A = [0, 4873, 823]
     B = [0, 487, 48988]
diff --git a/docs/CountSort.md b/docs/CountSort.md
@@ -2,6 +2,4 @@
 
 L5
 
-
-
 **Definition:** Count sort is a non-comparative sorting algorithm where we count the total number of instances of a given value and then reassemble a sorted output by creating a datastructure that contains the number of each value specified by the count. 
diff --git a/docs/CounterExample.md b/docs/CounterExample.md
@@ -2,6 +2,4 @@
 
 Abstract Math Proof Technique
 
-
-
 **Definition:** Counter example proofs are similar to [DirectProof](DirectProof.md), but instead of assuming that they are true you assume they are false. From this assumption you then need to show that this is in some way fallacious.  
diff --git a/docs/CountingPrinciple.md b/docs/CountingPrinciple.md
@@ -2,6 +2,4 @@
 
 Ch 0
 
-
-
 **Definition:** The counting principle is an enumeration technique where you determine the branching factor at each step and multiply all branching factors to find the total number of possible paths. 
diff --git a/docs/Covariance.md b/docs/Covariance.md
@@ -2,8 +2,6 @@
 
 Stats D2
 
-
-
 **Definition:** Covariance is the strength of a linear relationship between two different variables. When this number is larger it indicates that higher numbers for one of the variables is associated with higher numbers for the other. The inverse is also true (negative results in negative cov)
 
 There are also no bounds for the range of covariance unlike correlation.
diff --git a/docs/CramersRule.md b/docs/CramersRule.md
@@ -2,8 +2,6 @@
 
 3B1B
 
-
-
 **Definition:** Cramer's rule is an alternative to [GaussianElimination](GaussianElimination.md) for solving systems of equations.
 
 While slower and generally worse, it is novel.
diff --git a/docs/CreditAssignmentProblem.md b/docs/CreditAssignmentProblem.md
@@ -2,6 +2,4 @@
 
 L1
 
-
-
 **Definition:** The credit assigment problem is an RL problem where we need to determine how to rate choices in the near term given their long term consequences.
diff --git a/docs/CrossProduct.md b/docs/CrossProduct.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** The cross product of two vectors is the vector orthogonal to them. 
 
 The cross product is only defined in R^3. 
diff --git a/docs/CrossValidation.md b/docs/CrossValidation.md
@@ -2,8 +2,6 @@
 
 ML CH3
 
-
-
 **Definition:** Cross validation is the process of creating a subset of your data and then training the model on some subset of said data.
 
 A common form of this is k-fold cross-validation. This creates k-folds (subsets) and trains the model on each subset that is not selected. Then it validates the accuracy upon the one subset that was not selected to be used in training to use it as the validation set. 
diff --git a/docs/Crosstabulation.md b/docs/Crosstabulation.md
@@ -2,8 +2,6 @@
 
 Stats D4
 
-
-
 **Definition:** Crosstabulation in stats is a way to display three dimensional information. Across the top and side you have some classes and then in the table itself we have the information of the cross-sectional group.
 
 ex:
@@ -13,5 +11,4 @@ Admittance  	Male	Female
 		Admitted 1198  557
 		Rejected 1493  1278
 
-
 This data can be shown using a [MosaicPlot](MosaicPlot.md) for graphical viewing with sized boxes.
diff --git a/docs/CumulativeDensityFunction.md b/docs/CumulativeDensityFunction.md
@@ -2,8 +2,6 @@
 
 Prob L8
 
-
-
 **Definition:** A cumulative density function is a function of a random variable where any given value is the probability of getting an output less than or equal to the current value.
 
 This is defined mathmatically as F(x) = P(X \leq x). 
diff --git a/docs/Cycle.md b/docs/Cycle.md
@@ -2,6 +2,4 @@
 
 Ch 4
 
-
-
 **Definition:** A cycle is a path with (when removing the last node) that starts and ends at the same node where the sequence is at least 3 long. 
diff --git a/docs/DBSCAN.md b/docs/DBSCAN.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** DBSCAN is a clustering algorithm that groups clusters by continuous regions of high density.
 
 Steps to perform:
diff --git a/docs/DRAM.md b/docs/DRAM.md
@@ -2,9 +2,6 @@
 
 DRAM is what we think of as RAM. See [Memory](Memory.md) for other links.
 
-
-
-
 [DRAMBanks](DRAMBanks.md) are a 2d matrix of [[DRAMCell.md]] and it is accessed by rows. When the processor wants a row, it activates the row, sends it to the [[RowBuffer.md]], and then sends the data out. Subsequent accesses of a different column are very fast because the row is already in a buffer. This can be thought of cached rows.
 
 One optimization done is to prioritize memory requests associated with memory that is already buffered to decrease context switching. This causes issues with multiple applications because it will prioritize applications that use localized memory more often. You can also create programs that take advantage of this to deny memory from other applications. On the flip side, if you are simply using an oldest request scheduling algorithm then random access requests will take more time and thus if one application uses more of them it will get more time than the other application. 
diff --git a/docs/DRAMBanks.md b/docs/DRAMBanks.md
@@ -1,5 +1,3 @@
 # DRAM Banks
 
-
-
 **Definition:** 2d bank of [DRAMCell](DRAMCell.md) that is accessed by a row at a time rows may be around 8kb in size.  
diff --git a/docs/DRAMChips.md b/docs/DRAMChips.md
@@ -1,5 +1,3 @@
 # DRAM Chips
 
-
-
 DRAM Chips are the chips that contain the [DRAMBanks](DRAMBanks.md) along with associated circuitry. There are many chips (I think normally 8) that make up a RAM module. 
diff --git a/docs/DRAMRefresh.md b/docs/DRAMRefresh.md
@@ -1,7 +1,5 @@
 # DRAM Refresh
 
-
-
 This is the process of refreshing the energy stored in a [DRAMCell](DRAMCell.md)'s capacitor so that losses in energy over time do not cause loss of data (bitrot). 
 
 Currently, as of 2015, refreshes are required every 64ms. This costs electricity, can cause blocking issues, and as there is scaling these computations become slower and more power consuming. As an example, with 64gb DRAM refreshes can take up to 46% of time while 4gb is about 8%
diff --git a/docs/DRAMRowHammer.md b/docs/DRAMRowHammer.md
@@ -1,6 +1,4 @@
 
 Computer Architecture L1
 
-
-
 See [DisturbanceErrors](DisturbanceErrors.md) for more information as it describes this vulnerability. 
diff --git a/docs/DataAugmentation.md b/docs/DataAugmentation.md
@@ -2,8 +2,6 @@
 
 ML P773
 
-
-
 **Definition:** Data augmentation is the process of changing training data in such a way to make the training data set larger and more robust.
 
 In CNNs this often involves rotation, lighting, flipping, and other augmentations.
diff --git a/docs/DataStructureAugmentation.md b/docs/DataStructureAugmentation.md
@@ -2,8 +2,6 @@
 
 L2
 
-
-
 **Definition:** Data structure augmentation is adding something to a data structure to improve it in some way. 
 
 An example of this is to improve a singly linked list with a tail pointer so polling the tail can be done in O(1) instead of O(n) time. By doing this we could also have constant time additions onto the end of the list (ensure pointer is updated).
diff --git a/docs/DecisionThreshold.md b/docs/DecisionThreshold.md
@@ -2,8 +2,6 @@
 
 ML CH3
 
-
-
 **Definition:** In classical classification, a decision threshold is the position on some line where greater values are classified in some way and lesser value another way. 
 
 When we have a higher threshold it increases precision because things that are less likely to be classified will be considered not part of the set, but in turn doing this also decreases the recall because it is more likely to give false negatives.
diff --git a/docs/DecisionTrees.md b/docs/DecisionTrees.md
@@ -2,16 +2,12 @@
 
 ML D4
 
-
-
 **Definition:** Decision trees are a machine learning algorithm that does true/false comparison to go left and right until reaching a leaf node. This leaf node will then describe the output.
 
 ### Associated Links
 
 Classification and Regression Trees by Leo Breiman
 
-
-
 ### Visualizing
 
 You can use graphviz to visualize this graph. First, you train the model using sklearn.tree then you import export_graphviz from the same location. Using export_graphviz you can pass in the model, output file, feature names, class names , and some other information which will create a dotfile. 
@@ -51,14 +47,12 @@ Decision trees can output probabilities based on the values that are used to gen
 
 The max_depth hyperparameter is the best way to regularize decision trees and reduce overfitting risks. There is also max features (comparisons per node), leaf nodes, min samples split, and min samples leaf which do similar restriction.
 
-
 ### Uhh Ohh
 
 These things really like orthogonals but not so much angles. If you have a dataset that is easily seperatble at an angle but not vertically or horizontally you will have a bad time with decision trees.
 
 One mediation for this is to use a PCA which rotates the data to reduce correlation between features.
 
-
 ### Hmmm....
 
 Scikit learn uses a stocastic sampling when training decision trees meaning they aren't consistent training to training. This is why random forests can be cool.
diff --git a/docs/DemorgansLaw.md b/docs/DemorgansLaw.md
@@ -2,8 +2,6 @@
 
 1.3.2
 
-
-
 **Definition:** These are two fundamental laws of boolean algebra that can be simply derived.
 
 $\neg (p \wedge q) \equiv \neg p \vee \neg q$
diff --git a/docs/DensityEstimation.md b/docs/DensityEstimation.md
@@ -2,8 +2,6 @@
 
 Stats D3
 
-
-
 **Definition:** Density estimation is the process of modeling the probability of given values for a dataset.
 
 This can be thought of similar to a histogram without the bins. A common form of this is a kde. The reason these can be better is that it does not have binning which can make data appear innacurately depending on the cut points and bin widths.
diff --git a/docs/DerivedDistribution.md b/docs/DerivedDistribution.md
@@ -2,8 +2,6 @@
 
 L10
 
-
-
 **Definition:** Derived distributions are distributions where we take a function of a random variable. 
 
 This is generally defined as Y = g(X) where X is a random variable, Y is a random variable, and g is a function.
diff --git a/docs/DesignPoint.md b/docs/DesignPoint.md
@@ -2,8 +2,6 @@
 
 CA L3
 
-
-
 **Definition:** The point of a computer's design including constraints of the system. 
 
 Here are some of the design constraints:
diff --git a/docs/Determinant.md b/docs/Determinant.md
@@ -2,8 +2,6 @@
 
 CS331 - Linear Algebra - Khan U2
 
-
-
 **Definition:** The determinant is the scaling factor of some area (or volume in 3d space) from before to after a linear transformation. Note that this is only useful in 3d and 2d as the notion of volume in higher dimensions ([Hypervolume](Hypervolume.md)) is a bit abstract.
 
 This value can be negative if the space has been flipped. In 3d space, this means the volume after the tranformation is in left hand space if it was before in right hand space.
@@ -103,7 +101,6 @@ Ex.
 A = [0 8 7]
 	[0 0 7]
 
-
 det(A) = 2x8x7 = 112
 
 Written out taking the determinant using the first column:
diff --git a/docs/DiagonalMatrices.md b/docs/DiagonalMatrices.md
@@ -2,8 +2,6 @@
 
 Khan U2
 
-
-
 **Definition:** Diagonal matricies are matricies that have zeroes in all positions except for the diagonal from 1,1 to m,n.
 
 Diagonal matricies are the matricies that represent linear transformations where we are multiplying each axis component by some value and not combining different components together.
diff --git a/docs/Digraph.md b/docs/Digraph.md
@@ -2,8 +2,6 @@
 
 Ch 4
 
-
-
 **Definition:** A digraph is a directed graph meaning each edge has only one direction in which traversal is possible.
 
 When discussing digraphs, the start of an edge is called the initial vertex and the end is called the terminal vertex.
diff --git a/docs/DimensionalityReduction.md b/docs/DimensionalityReduction.md
@@ -2,8 +2,6 @@
 
 ML CH1
 
-
-
 **Definition:** This is where you have the goal of reducing the required data without losing too much information. This is like lossy compression. 
 
 This can be done by merging multiple correlated features into one. This is referred to as feature extraction where you extract a new feature from existing features to replace them. 
diff --git a/docs/Dimensions.md b/docs/Dimensions.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 2
 
-
-
 **Definition:** The dimension of a vector space is defined as the length of any basis of the vector space.
 
 Recall that the basis of a subspace is by definition linearly independent and thus provides us with the dimension of the vector space.
diff --git a/docs/DirectProof.md b/docs/DirectProof.md
@@ -2,6 +2,4 @@
 
 Abstract Math + Discrete Math U1.7.1
 
-
-
 **Definition:** Direct proof is the assumption that the initial statement is true. You then show that it being true is true for all cases. These statements start with, "Let's assume x is true" we then continue on to prove what it is that x implies. 
diff --git a/docs/DirectSum.md b/docs/DirectSum.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1
 
-
-
 **Definition:** A direct sum is a sum of two vector spaces that are disjoint except for the zero vector.
 
 This can also be stated as each element being writeable as a **unique** combination (sum) of vectors in the vector spaces. This notion leads to a good way to test this.
diff --git a/docs/DiscountFactor.md b/docs/DiscountFactor.md
@@ -2,8 +2,6 @@
 
 L2
 
-
-
 **Definition:** The discount factor in RL is the value gamma we use to describe how much or little we care about long term rewards with respect to the value function.
 
 The discount factor is to the power of the steps away you are from that reward so if gamma = .5 then we see we only care .5x as much about the next step as the current and then .25x as much about the one after that and so on.
diff --git a/docs/DiscreteProbability.md b/docs/DiscreteProbability.md
@@ -2,6 +2,4 @@
 
 Stats ch1
 
-
-
 **Definition:** A discrete probability is one where there are a finite set of outcomes or a countably infinite set of outcomes.
diff --git a/docs/DiscreteRandomVariable.md b/docs/DiscreteRandomVariable.md
@@ -2,6 +2,4 @@
 
 Ch 2.1
 
-
-
 **Definition:** A discrete random variable is a random variable with an outcome space of finite or countably infinite size. 
diff --git a/docs/DiscreteUniformLaw.md b/docs/DiscreteUniformLaw.md
@@ -2,6 +2,4 @@
 
 L1
 
-
-
 **Definition:** The discrete uniform law states that if all outcomes in a [SampleSpace](SampleSpace.md) are equally probable then P(A) where A is a set is the same as |A| / |Omega| where Omega is the entire sample space.
diff --git a/docs/DisjointSet.md b/docs/DisjointSet.md
@@ -2,6 +2,4 @@
 
 L1
 
-
-
 **Definition:** Disjoint sets are multiple sets where they have no elements in common.
diff --git a/docs/DistanceCalculation.md b/docs/DistanceCalculation.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** Distance calculation in any dimension is defined as sqrt((x_1 - y_1)^2 + (x_2 - y_2)^2 ...)
 
 In the above definition x_1 is the first component of the first point (or vector), and y_1 is the first component of the second point (or vector). We then repeat this by subtracting them, squaring them and then summing all of them. Finally, we take the square root. 
diff --git a/docs/DistanceToPlane.md b/docs/DistanceToPlane.md
@@ -2,8 +2,6 @@
 
 Distance from arbitrary point to plane
 
-
-
 If we take any point on the plane and then find the length of the opposite side of the new right triangle we then have the distance from the plane to the point.
 
 As such this is as simple as taking the dot product of the normal vector and the vector that connects the representative point and the other point. We then divide this by the lenght of the normal vector and that is the lenght of the opposite side which is also the distance between the plane and the point.
diff --git a/docs/Distinguishable.md b/docs/Distinguishable.md
@@ -2,6 +2,4 @@
 
 Ch 6.5
 
-
-
 **Definition:** Distinguishable means items are different in some way such that switching them results in a new permutation.
diff --git a/docs/DistinguishablePermutation.md b/docs/DistinguishablePermutation.md
@@ -2,6 +2,4 @@
 
 Ch 1.3
 
-
-
 **Definition:** A distinguishable permutation is a permutation that can be distinguished from all other permutations.
diff --git a/docs/Distributive.md b/docs/Distributive.md
@@ -2,6 +2,4 @@
 
 Ch 2.2
 
-
-
 **Definition:** Distributivity is a property of operators such that a(b+c) = ab + ac. 
diff --git a/docs/DistributiveLaw.md b/docs/DistributiveLaw.md
@@ -2,8 +2,6 @@
 
 1.3.2
 
-
-
 **Definition:** The distributive law of disjunction states $p \vee (q \wedge r) \equiv (p\vee q) \wedge (p \vee r)$.
 
 This can be thought of as being something or two other things. By this logic, we can then state it as the thing or one of the others and the thing or the second other.
diff --git a/docs/Div.md b/docs/Div.md
@@ -2,12 +2,8 @@
 
 U 2.4
 
-
-
 **Definition:** Div is a mathmatical function whereby we find the largest integer such that the second number times divisor is less than or equal to the first number. 
 
-
 ex:
 
-
 15 div 2 = 7
diff --git a/docs/DivideAndConquer.md b/docs/DivideAndConquer.md
@@ -2,8 +2,6 @@
 
 CLRS 2.3.1
 
-
-
 **Definition:** Divide and conquer algorithms are algorithms that break a problem down into smaller sub-problems and then solve each subproblem.
 
 This algorithms are often, but not always, recursive.
diff --git a/docs/DivisionAlgorithm.md b/docs/DivisionAlgorithm.md
@@ -4,13 +4,10 @@
 
 **Chapter:** 2.1
 
-
-
 **Definition:** The division algorithm is the theorem that given a = qn + r, 0 <= r < n; q = floor(a/n).
 
 This basically states that when r (the remainder) is greater than 0 (always), and less than n then the quotient is equivalent to the floor of the numerator divided by the denominator.
 
-
 ### Implementation in code to solve the Division Algorithm
 
 ```cpp
diff --git a/docs/DivisionRule.md b/docs/DivisionRule.md
@@ -2,8 +2,6 @@
 
 Ch 6.1
 
-
-
 **Definition:** The division rule is a rule that describes the total size of the outcome space of some function.
 
 A good way to think of this is as a function. Consider the function f A -> B where there are d values such that f(a) = b for some b in B. Knowing this, there are |A|/d total possible outcomes.
diff --git a/docs/DivisionRules.md b/docs/DivisionRules.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 2.1
 
-
-
 IMPORTANT RULES THAT MIGHT NOT BE OBVIOUS RIGHT AWAY:
 
 1. a | 1 then a = +- 1
diff --git a/docs/DotProduct.md b/docs/DotProduct.md
@@ -2,8 +2,6 @@
 
 CS331 + Khan 
 
-
-
 **Definition:** The dot product of two vectors is the sum of their corresponding components. 
 
 This can be visualized as the length of one vector, v, projected onto another vector, y, multiplied by the length of the vector y. Additionally, if two vectors generally have a different direction, their dot product is negative. This is why the on same side of plane algorithm works (see cs331 code), because if two vectors are on the same side of the normal vector of a plane, then they will both have negative or positive dot products.  
@@ -14,7 +12,6 @@ This value is zero if the vectors are orthogonal.
 
 The dotproduct in a geometric sense is u dot v = ||u|| ||v|| cos(theta) where theta is the angle between the two vectors. As such, when the angle between them is greater than 90 and less than 270 we find the dot product is negative.
 
-
 ### Intuition Of DP
 
 By definition, the dot product can be stated as follows where || defines lenghts of vectors and theta is the angle between the two vectors:
diff --git a/docs/DoublyLinkedList.md b/docs/DoublyLinkedList.md
@@ -2,7 +2,5 @@
 
 CS 221 W 11 Lecture 13. 
 
-
-
 **Definition:** This is a linked list that has a pointer to the tail and head that are accessible, and every element in the list has a pointer to the previous and next nodes. 
 
diff --git a/docs/Dropout.md b/docs/Dropout.md
@@ -2,8 +2,6 @@
 
 ML P604
 
-
-
 **Definition:** Dropout is a regularization technique for deep neural networks where upon each pass every neuron has a constant probability of being 'dropped out' meaning the output is 0.
 
 This works very well with a rate somewhere between 10%-50%. With RNNs we often do 20%-30% and with CNNs we use 40%-50%.
diff --git a/docs/Duality.md b/docs/Duality.md
@@ -2,8 +2,6 @@
 
 3B1B
 
-
-
 **Definition:** Duality is a natural but surprising correspondence between two types of things.
 
 This is like finding out that the dot product can be used to find the projection of a vector onto another vector.
diff --git a/docs/DynamicProgramming.md b/docs/DynamicProgramming.md
@@ -2,8 +2,6 @@
 
 L3
 
-
-
 **Definition:** Dynamic programming is the idea that we can break down a problem into subproblems, solve those subproblems, and then use the results to find the problem's overall solution.
 
 There are two necessary conditions for a problem to be solvable via DP:
diff --git a/docs/EarlyStopping.md b/docs/EarlyStopping.md
@@ -2,8 +2,6 @@
 
 ML D3
 
-
-
 **Definition:** Early stopping is the process of stopping a model early in training (assuming it uses GD or something akin to that) as a form of regularization.
 
 Early stopping decreases overfitting by stopping once a certain prediction error threshold is met. This also reduces time to train.
diff --git a/docs/EigenVector.md b/docs/EigenVector.md
@@ -2,8 +2,6 @@
 
 Self Study
 
-
-
 **Definition:** An Eigen Vector is a non-zero vector that when a linear transformation is performed upon it, the resulting vector is only moved by a scalar multiple (remains on the same line). 
 
 Associated with this, we also have an Eigen value which is the amount that a point on the Eigen Vector is distorted by (multiplied by this scalar)
diff --git a/docs/ElasticNetRegression.md b/docs/ElasticNetRegression.md
@@ -2,8 +2,6 @@
 
 ML D3
 
-
-
 **Definition:** Elastic net regression is another form of linear regression that adds a regularization term to the loss function which is a middle ground between ridge and lasso regression.
 
 As it relates to linear regression, it is good to add some regularization and when we know some coefficients should be 0 we should rely upon elastic regression. Otherwise ridge regression is a good option when we don't think there are useless features.
diff --git a/docs/ElementaryTransformations.md b/docs/ElementaryTransformations.md
@@ -2,8 +2,6 @@
 
 Ch 2.2
 
-
-
 **Definition:** Elementary transformations are transformations done to matricies that do not change the validity of the system of equations.
 
 These elementary transformations are what we use to solve systems of equations via gaussian elimination.
diff --git a/docs/EligibilityTraces.md b/docs/EligibilityTraces.md
@@ -2,8 +2,6 @@
 
 L4
 
-
-
 **Definition:** Eligibility traces combine both the frequency and recency heuristics to solve the credit assignment problem.
 
 Basically, every time we visit a state we increase the eligibility trace for the given state and over time this decays off. Higher values means the state is more associated with the outcome and lower means less. This allows us to both care about frequency because each visit adds to the trace, and care about recency because of decay.
diff --git a/docs/EmptyGraph.md b/docs/EmptyGraph.md
@@ -2,6 +2,4 @@
 
 Ch 4
 
-
-
 **Definition:** The empty graph is a graph that does not have any nodes and subsequently does not have any edges. 
diff --git a/docs/Ensembles.md b/docs/Ensembles.md
@@ -2,6 +2,4 @@
 
 CH2
 
-
-
 **Definition:** Ensembles are models composed of multiple models. These models can be the same like with random forests or different models put together.
diff --git a/docs/Entropy.md b/docs/Entropy.md
@@ -2,8 +2,6 @@
 
 Ch 6
 
-
-
 **Definition:** Entropy is the average number of bits communicated by one message if message hoarding is allowed.
 
 Entropy of a finite set of messages is denoted as H(X).
diff --git a/docs/Episode.md b/docs/Episode.md
@@ -2,6 +2,4 @@
 
 L4
 
-
-
 **Definition:** In episode in RL is a given evaluation of a policy from start to finish.
diff --git a/docs/Episodic.md b/docs/Episodic.md
@@ -2,6 +2,4 @@
 
 L4
 
-
-
 **Definition:** Episodic, with resepect to RL, means that there are episodes as opposed to non-episodic which means something continues on forever.
diff --git a/docs/EuclideanAlgorithm.md b/docs/EuclideanAlgorithm.md
@@ -2,8 +2,6 @@
 
 Ch 2.4
 
-
-
 **Definition:** The Euclidean algorithm is an algorithm used to determine the greatest common factor of two positive integers.
 
 This is done as an alternative to the prime factoziation method which is too slow.
diff --git a/docs/EulersTheorem.md b/docs/EulersTheorem.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 2.5
 
-
-
 **Definition:** Euler's theorem states that for every $a$ and $n$ that are relatively prime $a^{\phi(n)} \equiv 1 \text{(mod )} n \text{)}$.
 
 An alternative form of this is $a^{\phi(n) + 1} \equiv a \text{(mod )} n \text{)}$. This form does not require that $a$ and $n$ be relatively prime.
diff --git a/docs/Evaluation.md b/docs/Evaluation.md
@@ -2,6 +2,4 @@
 
 L1
 
-
-
 **Definition:** Evaluation in RL is the process of seeing how good a policy is.
diff --git a/docs/Event.md b/docs/Event.md
@@ -2,8 +2,6 @@
 
 CH 1.2
 
-
-
 **Definition:** An event is a subset of the sample space.
 
 The frequency of the event A is denoted $\mathbf{N}(A)$.
diff --git a/docs/EvolutionaryMethods.md b/docs/EvolutionaryMethods.md
@@ -2,6 +2,4 @@
 
 RL Ch 1
 
-
-
 **Definition:** Evolutionary methods are a class of RL strategies where learning is not done by interacting with the environment but rather by updating policies using a strategy akin to evolution where the best models continue on.
diff --git a/docs/ExhaustiveProof.md b/docs/ExhaustiveProof.md
@@ -2,8 +2,6 @@
 
 U 1.8.2
 
-
-
 **Definition:** An exhaustive proof is similar to proof by cases except we evaluate it for all specific examples which needs to be a relatively small number.
 
 An exhaustive proof that "a < a+1 for 1 < a < 5 where a $\in \Z$" would show the statement is true for 2, 3, and 4.
diff --git a/docs/Expectation.md b/docs/Expectation.md
@@ -2,8 +2,6 @@
 
 L6
 
-
-
 **Definition:** The expected value of a PMF is the weighted average of output.
 
 This is calculated by summing the probabilities of each output multiplied by the output value. This will be the 'middle' of the sample space (weighted average).
diff --git a/docs/ExplodingGradients.md b/docs/ExplodingGradients.md
@@ -2,8 +2,6 @@
 
 ML 550
 
-
-
 **Definition:** Exploding gradients is a problem with training neural networks where lower levels have very high gradients and thus the gradient steps diverge from a proper solution.
 
 This is the opposite of [VanishingGradients](VanishingGradients.md)
diff --git a/docs/Exploit.md b/docs/Exploit.md
@@ -2,8 +2,6 @@
 
 RL Ch 1
 
-
-
 **Definition:** To exploit in RL means to take the known best move in the current state.
 
 This is the opposite of explore which is to take a random move and see how that plays out in the future in case it may be better than the current best known option.
diff --git a/docs/ExploratoryDataAnalysis.md b/docs/ExploratoryDataAnalysis.md
@@ -2,6 +2,4 @@
 
 Stats D3
 
-
-
 **Definition:** Exploratory data analysis is the process of exploring a dataset to find patterns and to create models/statistics/visualizations.
diff --git a/docs/Explore.md b/docs/Explore.md
@@ -2,6 +2,4 @@
 
 RL Ch 1
 
-
-
 **Definition:** To explore in RL means to select an option that is either unknown or suboptimal and then continuing the evaluate that path with the hope it may lead to a better outcome than the known best option.
diff --git a/docs/ExtraTrees.md b/docs/ExtraTrees.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** Extra trees are decisions trees that incorporate extra randomness by randomizing splitting thresholds instead of using gini impurity of information gain to determine splitting thresholds.
 
 Basically, each leaf selects a random feature and then selects a random value that is in the set of valid inputs for the node and splits upon that. This adds lots of randomness and greatly reduces training time because the optimal split at each point in time does not need to be calculated. 
diff --git a/docs/FactorsOfVariation.md b/docs/FactorsOfVariation.md
@@ -4,6 +4,4 @@
 
 **Chapter:** 1
 
-
-
 **Definition:** Factors of variation are features of the input that can be used to delineate between different labels or regression values.
diff --git a/docs/Feature.md b/docs/Feature.md
@@ -2,8 +2,6 @@
 
 ML CH1
 
-
-
 **Definition:** A feature is a ml term used to describe either an individual feature of a sample or a given feature of all samples. 
 
 Example of one sample: The fuel economy feature of the toyota carolla is very high.
diff --git a/docs/FeatureScaling.md b/docs/FeatureScaling.md
@@ -2,8 +2,6 @@
 
 ML CH2
 
-
-
 **Definition:** Feature scaling is the process of changing input features to be scaled in a similar way. 
 
 Feature scaling is important because machine learning algorithms don't do well when you have lots of vectors that use vastly different scales of values.
diff --git a/docs/FermatsTheorem.md b/docs/FermatsTheorem.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 2.5
 
-
-
 **Definition:** Fermat's theorem states if p is prime and a is a positive integer not divisible by p then $a^{p-1} \equiv 1 \text{(mod } p \text{)}$.
 
 An alternative form of this is that $a^{p} \equiv a \text{(mod } p \text{)}$. With this statement there is no requirement that $a$ be relatively prime to $p$.
diff --git a/docs/FibonacciNumbers.md b/docs/FibonacciNumbers.md
@@ -2,8 +2,5 @@
 
 Abstract Math 10.5. 
 
-
-
 **Definition:** The set of numbers in the form $F_n = F_{n-1} + F_{n-2}$ starting from 1 as the first value. 
 
-
diff --git a/docs/FiniteDimensional.md b/docs/FiniteDimensional.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 2
 
-
-
 **Definition:** A vector space is finite dimensional if it contains a list of vectors that span the space.
 
 Finite dimensional is antithetical to infinite dimensional which is a vector space that does not contain a list of vectors that span the entire space. This can occur when we have a vector space that has an infinite number of coordinates, but since lists must be finite, we can't define a list of vectors that spans the entire space.
diff --git a/docs/Floor.md b/docs/Floor.md
@@ -2,8 +2,6 @@
 
 U2.3.4
 
-
-
 **Definition:** The floor function specifies to round down the input to the nearest integer. 
 
 Remember to round to the lower number for negative numbers.
diff --git a/docs/Folding.md b/docs/Folding.md
@@ -2,6 +2,4 @@
 
 Ch 5
 
-
-
 **Definition:** Folding is a process used in a hashing function where we split the key into discrete parts and then operate upon each of them seperately. 
diff --git a/docs/FreeVariables.md b/docs/FreeVariables.md
@@ -2,8 +2,6 @@
 
 Ch 2.2
 
-
-
 **Definition:** Free variables are variables in RREF that are not alone in their column.
 
 The existence of free variables means there are infinitely many solutions to a system of equations.
diff --git a/docs/Frequency.md b/docs/Frequency.md
@@ -2,8 +2,6 @@
 
 Ch 1.1
 
-
-
 **Definition:** Frequency describes the number of occurences of a given outcome from the trials of a random experiment.
 
 Frequency is often confused with [RelativeFrequency](RelativeFrequency.md) and [[Probability.md]] but they are different terms as the others desribe relative likelihood of an event.
diff --git a/docs/FrequencyHeuristic.md b/docs/FrequencyHeuristic.md
@@ -2,8 +2,6 @@
 
 L4
 
-
-
 **Definition:** The frequency heuristic is the idea that we assign credit based on how frequently things happen.
 
 In RL if we are to see 4 bells, 1 light, and get a negative reward, then by the frequency heuristic we could state the 4 bells caused the negative reward. 
diff --git a/docs/FrobeniusNorm.md b/docs/FrobeniusNorm.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 2
 
-
-
 **Definition:** The Frobenius norm is a norm defined on matricies. This norm is defined as follows:
 
 ||A||_F = sqrt(sum(A^2 for all i,j))
diff --git a/docs/FunctionCompositionOperator.md b/docs/FunctionCompositionOperator.md
@@ -8,7 +8,6 @@
 
 ## Example
 
-
 ```haskell
 
 addOne num = num + 1
diff --git a/docs/FunctionNotation.md b/docs/FunctionNotation.md
@@ -2,6 +2,4 @@
 
 Ch 0
 
-
-
 **Definition:** Function notation is using formal math logic such as f(x) : X -> Y to define tasks.
diff --git a/docs/FundamentalOperations.md b/docs/FundamentalOperations.md
@@ -2,8 +2,6 @@
 
 L1
 
-
-
 **Definition:** Fundamental operations are operations that take constant time.
 
 #### List of fundamental operations
diff --git a/docs/FundamentalTheoremOfArithmetic.md b/docs/FundamentalTheoremOfArithmetic.md
@@ -8,4 +8,3 @@ This means any number can be given in the form a * b * ... * z where all numbers
 
 The proof of this is quite interesting. Basically it states, if a number is not the prime factorization of any other numbers, then it can't be made because all numbers can be made by multiplying primes together thus this number must be prime meaning it factorizes itself. 
 
-
diff --git a/docs/FundamentalTheroemofCalculus.md b/docs/FundamentalTheroemofCalculus.md
@@ -2,8 +2,6 @@
 
 Khan U1
 
-
-
 **Definition:** The (second) fundamental theroem of calculus states that the derivative of the integral of a function from a (constant) to x that is continuous is equivalent to the contained function with respect to x.
 
 This implies that all functions that are continuous on a domain have an antiderivative.
diff --git a/docs/GCD.md b/docs/GCD.md
@@ -2,8 +2,6 @@
 
 U 2.4
 
-
-
 **Definition:** The GCD of two numbers a and c is the largest integer such that a | b and a | c.
 
 To find the GCD of two numbers find the prime factorization and then take the min exponent of each prior prime. Evaluate this to find the GCD.
diff --git a/docs/GameLoop.md b/docs/GameLoop.md
@@ -2,8 +2,6 @@
 
 CS 331 W12 L3
 
-
-
 **Definition:** Each frame the loop function of each script is called. 
 
 This is the same idea as animation which is giving motion to still images. 
diff --git a/docs/GameObject.md b/docs/GameObject.md
@@ -2,8 +2,6 @@
 
 CS 331 W12 L3
 
-
-
 **Definition:** This is the data type of objects in the game. This is a broad class that has some built in functionallity. 
 
 A common way to move an object forward using it's [Vector3](Vector3.md) is as follows:
diff --git a/docs/GaussianElimination.md b/docs/GaussianElimination.md
@@ -2,8 +2,6 @@
 
 Khan U1
 
-
-
 **Definition:** Gaussian elimination is the process of simplifying a system of equations to [ReducedRowEchelonForm](ReducedRowEchelonForm.md) to solve the system.
 
 Basically, we perform row operations on an augmented matrix to get RREF. We then find the values of the x, y, and z components and that is our solution.
diff --git a/docs/GaussianIntegers.md b/docs/GaussianIntegers.md
@@ -2,6 +2,4 @@
 
 AM W13 L1
 
-
-
 **Definition:** This is the set of all numbers of the form a + bi such that a and b are integers and i^2 is -1. 
diff --git a/docs/GaussianMixtureModels.md b/docs/GaussianMixtureModels.md
@@ -2,6 +2,4 @@
 
 ML D5
 
-
-
 **Definition:** Gaussian mixture models (GMMs) are probabilistic models that assume instances were generated using several gaussian distributions where each distribution forms its own cluster.
diff --git a/docs/GeneralSolution.md b/docs/GeneralSolution.md
@@ -2,6 +2,4 @@
 
 Ch 2.2
 
-
-
 **Definition:** A general solution to a system of linear equations is one that describes all possible solutions as combinations of each other.
diff --git a/docs/GeneralizationError.md b/docs/GeneralizationError.md
@@ -2,8 +2,6 @@
 
 ML CH1
 
-
-
 **Definition:** Generalization error or out-of-sample error, is the error rate of a model on data that is not in the training set. 
 
 When testing a model it is important to have a training set and a test set which is a certain amount of the total number of samples. You then train the model and check to see its accuracy on the test set. This accuracy is the generalization error rate.
diff --git a/docs/GeneralizedPigeonholePrinciple.md b/docs/GeneralizedPigeonholePrinciple.md
@@ -2,6 +2,4 @@
 
 Ch 6.2
 
-
-
 **Definition:** The generalized pigeonhole principle is \ceil{N/k} where N is the number of elements and k the number of groups. This gives us the maximally filled group given equitable distribution.
diff --git a/docs/GradientBoosting.md b/docs/GradientBoosting.md
@@ -1,8 +1,6 @@
 
 ML D5
 
-
-
 **Definition:** Gradient boosting sequentially adds predictors to an ensemble and fits subsequent models not by instance weights like adaboosting but by residual errors.
 
 Residual errors are simply the difference between expected and predicted values. As such, gradient boosting does not use weighting in the same way as adaboosting thus distinguishing the two. It basically tries to predict the error amounts from the prior model and output what it thinks they will be.
diff --git a/docs/GradientClipping.md b/docs/GradientClipping.md
@@ -2,8 +2,6 @@
 
 ML P569
 
-
-
 **Definition:** Gradient clipping is the process of clipping gradients during backpropogration so they never exceed some threshold.
 
 This is another technique used to resolve issues relating to [ExplodingGradients](ExplodingGradients.md) particularly for RNNs where batch normalization does not work.
diff --git a/docs/GradientDescent.md b/docs/GradientDescent.md
@@ -2,8 +2,6 @@
 
 ML L2
 
-
-
 **Definition:** Gradient Descent is an algorithm used to find a 'near' optimal approach to the given problem. This is used with [LinearRegression](LinearRegression.md) to optimize the function by selecting a set of parameters $\theta$ and then repeatedly finding the direction that results in the fastest movement towards a cost function's value nearest to 0. This will find a local optimum. With linear regression however there will not be local optimum but only global.
 
 General idea is to start with some $\theta$ (parameters) and keep changing it to reduce J($\theta$). (Find J in [LinearRegression](LinearRegression.md))
diff --git a/docs/GradientDescentCode.md b/docs/GradientDescentCode.md
@@ -9,7 +9,6 @@ import random
 
 RECURSION_LIMIT = 1500
 
-
 sys.setrecursionlimit(RECURSION_LIMIT)
 print("ax^3 + bx^2 + cx + d")
 a = float(input("Enter a: "))
@@ -18,15 +17,12 @@ c = float(input("Enter c: "))
 d = float(input("Enter d: "))
 learningRate = float(input("Learning Rate: "))
 
-
 def calculateValue(x):
     return x**3 * a + x**2 * b + x * c + d
 
-
 def printResult(x, y):
     print("x: " + str(x) + "\ty: " + str(y))
 
-
 def limit(x):
     rightX = x + .00000001
     leftX = x - .00000001
@@ -36,7 +32,6 @@ def limit(x):
 
     return ((rightY - leftY) / .00000002)
 
-
 def descend(x, depth):
     # Need - 15 because recursion includes other function calls...
     if depth >= RECURSION_LIMIT - 15:
@@ -50,7 +45,6 @@ def descend(x, depth):
     else:
         return descend(x + learningRate * lim, depth)
 
-
 currSearch = random.random() * 10
 xVal = descend(currSearch, 0)
 
diff --git a/docs/GramSchmidtProcess.md b/docs/GramSchmidtProcess.md
@@ -2,8 +2,6 @@
 
 Khan U3
 
-
-
 **Definition:** The Gram-Schmidt process is a process for finding an orthonormal basis of a subspace. 
 
 Basically, if we have a basis we can find the orthonormal basis of the subspace by normalizing the first vector, projecting a subsequent one onto it and subtracting that from the original vector, normalizing that new vector, and repeating each time projecting onto all existing basis'.
diff --git a/docs/HadamardProduct.md b/docs/HadamardProduct.md
@@ -2,8 +2,6 @@
 
 Ch 2.2
 
-
-
 **Definition:** The Hadamard product of two matricies (assuming they are the same size) is an index based multiplication of each element of both matricies.
 
 This product is used with CNNs because the kernel applies a Hadamard product to the underlying masked portion of the matrix with respect to the kernel.
diff --git a/docs/HalfWord.md b/docs/HalfWord.md
@@ -2,6 +2,4 @@
 
 W1
 
-
-
 **Definition:** This is half the size of a CPU's word.
diff --git a/docs/HarmonicMean.md b/docs/HarmonicMean.md
@@ -2,18 +2,14 @@
 
 ML D2
 
-
-
 **Definition:** The harmonic mean is a metric used to describe the accuracy of a model. This value is representative of the precision and recall of a model.
 
 Basically, this is a combination of [Precision](Precision.md) and recall
 
 The harmonic mean favors models with similarly good values for both recall and precision which can be good in certain cases. There are however many cases where precision, recall, or accuracy may be more important.
 
-
 Formula:
 
-
 F_1 = 2 * (p * r) / (p+r)
 
 Where p = [Precision](Precision.md) and r = recall
diff --git a/docs/HasseDiagram.md b/docs/HasseDiagram.md
@@ -2,8 +2,6 @@
 
 Ch 9.6
 
-
-
 **Definition:** A hasse diagram is a way to show a (finite) poset in a graphical way. 
 
 To create a hasse diagram first we create a digraph of a relation. We then remove all loops and finally we remove directionallity s.t. all elements below the current that are directly connected must also be comparable.
diff --git a/docs/HistogramBasedGradientBoosting.md b/docs/HistogramBasedGradientBoosting.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** Histogram based gradient boosting is an implementation of gradient boosting that uses binning of input features.
 
 This is much faster than normal gradient boosting. Also, the normal way of doing this is by rounding to integers for values.
diff --git a/docs/HistoricalDesigns.md b/docs/HistoricalDesigns.md
@@ -2,6 +2,4 @@
 
 Discussion of designs used historically and things we can take away. 
 
-
-
 There is a trade off taken historically to use many cores instead of a single powerful core. It is much easier to architect simple cores that chain together than to architect one powerful core. This has a trade off in that it requires developers higher in the stack to ensure their code takes advantage of all of the cores using parallelization. 
diff --git a/docs/Hyperparameter.md b/docs/Hyperparameter.md
@@ -2,8 +2,6 @@
 
 ML CH2
 
-
-
 **Definition:** A hyperparameter in ML is a parameter that is defined prior to training that is not influenced by samples.
 
 Examples of hyperparmeters are [LearningRate](LearningRate.md) and m in the case of calculating weighted means. More about this can be seen here [[TargetEncoding.md]]
diff --git a/docs/Hypervolume.md b/docs/Hypervolume.md
@@ -2,6 +2,4 @@
 
 Khan U2
 
-
-
 **Definition:** Hypervolume much like [Hyperplane](Hyperplane.md) is volume in dimensions higher than 3.
diff --git a/docs/IPD.md b/docs/IPD.md
@@ -1,8 +1,6 @@
 
 CS 331 W16
 
-
-
 **Definition:** This is the distance between the pupils. 
 
 This value is important to calculate to ensure both images are rendered properly, think about parallax and how there could be issues.
diff --git a/docs/IQR.md b/docs/IQR.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** The IQR is the difference between the 75th percentile and 25th percentile as a value.
 
 This is also called the midspread, fourth spread, or H-spread.
diff --git a/docs/ISA.md b/docs/ISA.md
@@ -30,7 +30,6 @@ This ties into semantic gap which describes the difference between the ISA and w
 
 Virtual memory support is also part of the ISA.
 
-
 ---
 
 There is also another division in ISAs being load/store vs memory/memory architecture. 
@@ -47,4 +46,3 @@ Orthogonal ISAs allow for all opcodes to be used regardless of addressing mode.
 
 ---
 
-
diff --git a/docs/IdentityMatrix.md b/docs/IdentityMatrix.md
@@ -2,8 +2,6 @@
 
 Khan Unit 2
 
-
-
 **Definition:** The identity matrix is the matrix in R^n such that any matrix in R^n multiplied by it is equal to itself. 
 
 This matrix can be stated as follows where each row has one '1':
diff --git a/docs/Image.md b/docs/Image.md
@@ -2,8 +2,6 @@
 
 Khan U2
 
-
-
 **Definition:** The image of a function is the total set of all outputs of a given function (transformation for vectors).
 
 This is the same as [Range](Range.md)
diff --git a/docs/ImitationLearning.md b/docs/ImitationLearning.md
@@ -2,8 +2,6 @@
 
 L1
 
-
-
 **Definition:** Imitation learning is not RL. It is the process of training a model on expert data making it a form of supervised learning.
 
 Tangentially related is inverse reinforcement learning where a moduel learns the reward function that the expert is trying to follow.
diff --git a/docs/Imputation.md b/docs/Imputation.md
@@ -2,8 +2,6 @@
 
 CH2
 
-
-
 **Definition:** Imputation is the process of filling in null values with some appropriate value.
 
 This is often done with ml to set null values to 0, mean, median, or some other appropriate value.
diff --git a/docs/IncrementalMean.md b/docs/IncrementalMean.md
@@ -2,8 +2,6 @@
 
 L4
 
-
-
 **Definition:** Incremental mean is a mean calculation where we update the mean according to the next sample without having to calculate the mean by summing all priors.
 
 This is often used with Monte Carlo Learning where we calculate the empirical mean (perceived mean) not by summing all returns and dividing by iterations, but instead by updating it each time it is visited only based on the change the current finding will make.
@@ -12,7 +10,6 @@ With incremental mean all we need to know is the prior mean, the current sample,
 
 Here is a simple python implementation:
 
-
 ```python
 
 import numpy as np
diff --git a/docs/Independence.md b/docs/Independence.md
@@ -2,8 +2,6 @@
 
 L3
 
-
-
 **Definition:** Independence in probability is the case where some even B occuring does not affect the conditional probability of A occuring. 
 
 Two Formal Definitions:
diff --git a/docs/IndependentEvents.md b/docs/IndependentEvents.md
@@ -2,8 +2,6 @@
 
 Ch 1.4
 
-
-
 **Definition:** Independent events are events such that the conditional probability is equivalent to the unconditioned probability of the given event.
 
 P(A|B) = P(A) and P(B | A) = P(B).
diff --git a/docs/Indistinguishable.md b/docs/Indistinguishable.md
@@ -2,6 +2,4 @@
 
 Ch 6.5
 
-
-
 **Definition:** Indistinguishable means two elements, when switches, do not result in a new permutation.
diff --git a/docs/Individuals.md b/docs/Individuals.md
@@ -2,6 +2,4 @@
 
 Khan
 
-
-
 **Definition:** The individuals of a dataset are the objects being studied.  
diff --git a/docs/Induction.md b/docs/Induction.md
@@ -2,8 +2,6 @@
 
 Proof by induction from W11 abstract algebra. Induction is used to prove a statement relating to infinite sets of elements. This is not to be confused with inductive reasoning which is assumptions based on past data. 
 
-
-
 **Definition:** This type of proof is done by proving that the first is true and how that subsequently means the rest are true (think dominoes).
 
 Steps to prove:
diff --git a/docs/Inertia.md b/docs/Inertia.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** Inertia in machine learning is the sum of the squared distances from instances to their closest centroid. 
 
 This is often used as a gauge for the accuracy of a [KMeans](KMeans.md) model.
diff --git a/docs/Inference.md b/docs/Inference.md
@@ -2,8 +2,6 @@
 
 Ch2
 
-
-
 **Definition:** Inference is the statistical process of finding relationships between data.
 
 This is not to be confused with [Prediction](Prediction.md) which is the process of guessing an output.
diff --git a/docs/InformationContent.md b/docs/InformationContent.md
@@ -2,6 +2,4 @@
 
 Ch 6
 
-
-
 **Definition:** The information content of a finite set of messages S is log_b(n) where n is the cardinality of S and b is the counting system (2 for binary).
diff --git a/docs/InformationSecurity.md b/docs/InformationSecurity.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1.1
 
-
-
 **Definition:** Information security is a subset of cybersecurity which is focused on confidentiality, integrity, and availability of information.
 
 Despite being described as a subset of cybersecurity by the book, they also concede that it includes physical security, which clearly is not.
diff --git a/docs/Injective.md b/docs/Injective.md
@@ -2,8 +2,6 @@
 
 L2
 
-
-
 **Definition:** For a function to be injective each value in the domain must map to a unique value in the codomain.
 
 Sometimes called one-to-one because there is one y for each x.
diff --git a/docs/Input.md b/docs/Input.md
@@ -2,8 +2,6 @@
 
 CS 331 W12 L1
 
-
-
 **Definition:** Input is the class used the get input from the user. 
 
 ### Common Methods
diff --git a/docs/InsertionSort.md b/docs/InsertionSort.md
@@ -2,11 +2,8 @@
 
 CRLS 2.1
 
-
-
 **Definition:** Insertion sort is a sorting algorithm with a worst case complexity of n^2 that selects the next element in the array, moves it to the left side in the correctly sorted position, and then iterates through the list for all items.
 
-
 This can be thought of as sorting cards by hand. You start with all cards in the right hand. You then remove the first card and place it in the left hand. You then do the same thing for the next card in the right hand placing it in the sorted location. You continue this until are cards are in the left hand. 
 
 The only issue with this analogy is that insertion sort uses only one array instead of two where you track the sorted part of the array and as elements are added all elements to the right are pushed over until reaching the spot where it was formerly. 
diff --git a/docs/InstanceBasedLearning.md b/docs/InstanceBasedLearning.md
@@ -2,8 +2,6 @@
 
 ML CH1
 
-
-
 **Definition:** Instance based learning is a system by which we identify information and when it occurs again, we are able to detect it. 
 
 Think of a spam filter. Something is marked as spam, if the exact same message is seen again, it will be marked as spam because last time it was. 
diff --git a/docs/Instruction.md b/docs/Instruction.md
@@ -2,8 +2,6 @@
 
 CA L3
 
-
-
 **Definition:** An instruction is the most basic element of the hardware software interface which describes what to do and to who. 
 
 An instruction is made of two parts, the [Opcode](Opcode.md) describes what to do, and the [[Operands.md]] describe to who. 
@@ -17,5 +15,4 @@ There are also classes of instructions. These are the following 3:
 3. Control Flow Instructions
     - Change sequence of instructions to execute
 
-
 See [ISA](ISA.md) for more about instruction sets. 
diff --git a/docs/IntegerOverflow.md b/docs/IntegerOverflow.md
@@ -2,6 +2,4 @@
 
 W1
 
-
-
 **Definition:** An integer overflow is where we carry a 1 past the end of an integer thus causing it to be 'lost'.
diff --git a/docs/Integrity.md b/docs/Integrity.md
@@ -4,6 +4,4 @@
 
 **Chapter:** 1.1
 
-
-
 **Definition:** Integrity of data means it is only changed in a specified and authorized manor and integrity of systems means systems perform their intended function in an unimpaired manner, free of manipulation.
diff --git a/docs/IntelligenceExplosion.md b/docs/IntelligenceExplosion.md
@@ -2,6 +2,4 @@
 
 Superintelligence - Bostrom
 
-
-
 **Definition:** The intelligence explosion is the idea that once a system achieves human intelligence, it will then be able to recursively self improve causing an explosion in intelligence.
diff --git a/docs/Intractable.md b/docs/Intractable.md
@@ -2,8 +2,6 @@
 
 U 2.3
 
-
-
 **Definition:** An intractable problem is one that can not be solved in polynomial time.
 
 These problems generally run in exonential, factorial, or some other time complexity that is higher than polynomial.
diff --git a/docs/Invariance.md b/docs/Invariance.md
@@ -2,8 +2,6 @@
 
 SS
 
-
-
 **Definition:** Invariance in ML describes changes to objects such that the model should still interpret the object the same way.
 
 There are a few different types including translational, rotational, and size invariance. 
diff --git a/docs/Inverse.md b/docs/Inverse.md
@@ -2,8 +2,6 @@
 
 1.1.2
 
-
-
 **Definition:** The inverse of an implication statement is the negation of both terms.
 
 $\neg p \to \neg q$ Where the original was $p \to q$
diff --git a/docs/InverseFunction.md b/docs/InverseFunction.md
@@ -2,8 +2,6 @@
 
 L2
 
-
-
 **Definition:** The inverse function of f(x) is defined as f^-1(x) where f^-1(x) maps from the codomain of f(x) to the domain of f(x).
 
 As such, for a function to be invertible it must be a bijection.
diff --git a/docs/InverseMatrix.md b/docs/InverseMatrix.md
@@ -4,8 +4,6 @@
 
 **Lecture:** 2
 
-
-
 **Definition:** The inverse matrix is the matrix such that A * A' = I.
 
 ## Example
diff --git a/docs/InverseTransformation.md b/docs/InverseTransformation.md
@@ -2,8 +2,6 @@
 
 Khan U2
 
-
-
 **Definition:** The inverse of a transformation is the transformation that undoes the original transformation for the entire domain codomain of the original transformation.
 
 This transformation must be [Bijective](Bijective.md) otherwise there will be issues with mappings either there are outputs without inputs or there are outputs with multiple inputs.
diff --git a/docs/Invertible.md b/docs/Invertible.md
@@ -2,13 +2,10 @@
 
 Khan
 
-
-
 **Definition:** For a matrix A to be invertible there must be another matrix B such that A * B = I and B * A = I where I is the identity matrix.
 
 If a matrix has an invenrse then it has a unique inverse.
 
-
 **Proposition:**
 
 A linear map (matrix) is invertible iff it is injective and surjective (spans the ambient space).
diff --git a/docs/IteratedExpectations.md b/docs/IteratedExpectations.md
@@ -2,8 +2,6 @@
 
 L12
 
-
-
 **Definition:** The law of iterated expectations states the expected value of a conditional expectation is the unconditional expectation. 
 
 Simply, this means that when finding the expectationt of some random variable we can sum the weighted expectation for its base components. 
diff --git a/docs/Jerk.md b/docs/Jerk.md
@@ -2,8 +2,6 @@
 
 Section 2.8
 
-
-
 **Definition:** A jerk is the third derivative of a position function. 
 
 The first derivative would be velocity, second would be acceleration, third is jerk.
diff --git a/docs/JointDensityFunction.md b/docs/JointDensityFunction.md
@@ -2,8 +2,6 @@
 
 Prob L9
 
-
-
 **Definition:** A joint density function is a function that takes two inputs and outputs a probability of the combination. 
 
 We can define the function as f_{XY} : R^2 -> R such that for all A in R^2 we have P((X,Y) in A) = integral(integral(f_XY(x,y)))
diff --git a/docs/JointProbability.md b/docs/JointProbability.md
@@ -2,8 +2,6 @@
 
 Stats L2 + L6
 
-
-
 **Definition:** A joint probability is the probability of multiple conditions.
 
 An example of this is that 48% of voters are in favor of the bill and democrats. This is the joint probability of any given voter being both a democrat and in favor of the bill.
diff --git a/docs/KNearestNeighbor.md b/docs/KNearestNeighbor.md
@@ -2,8 +2,6 @@
 
 ML CH1
 
-
-
 **Definition:** k nearest neighbor is the idea of using the k nearest elements of some set to derive some information. 
 
 In ml, this can be used used to find the k nearest neighbor regression of a sample using an instance based approach where you would find the k nearest values and average them. This would then be the prediction for the sample. 
diff --git a/docs/Kernel.md b/docs/Kernel.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** The kernel of a linear transformation is the set of all vectors that are equal to the null vector under the L.T.
 
 This is stated as ker(T), spoken as the kernel of T.
diff --git a/docs/Key.md b/docs/Key.md
@@ -2,8 +2,6 @@
 
 Ch 5
 
-
-
 **Definition:** A key is list of attribute of an object x that uniquely identifies it from all other elements of our universe. 
 
 In hashtables we hash the keys and then store the items. When openaddressing, if the queried for object is not the one at the address, we continue on with our probing algorithm until we find the item or find an empty spot (this assumes we don't remove items from our table).
diff --git a/docs/KeyframeAnimation.md b/docs/KeyframeAnimation.md
@@ -2,8 +2,6 @@
 
 CG W13 L3
 
-
-
 **Definition:** Keyframe animation is the process of animation used in blender where you specify keyframes and positions of objects at said times. 
 
 ### Motion / Objects:
diff --git a/docs/Keyless.md b/docs/Keyless.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1.6
 
-
-
 **Definition:** Keyless cryptography is the transformation of data without using encryption keys.
 
 Such cryptography is not often thought of as cryptography, but it is by the formal definition of cryptography.
diff --git a/docs/KnowledgeBaseApproach.md b/docs/KnowledgeBaseApproach.md
@@ -4,6 +4,4 @@
 
 **Chapter:** 1
 
-
-
 **Definition:** The knowledge base approach to machine learning is the idea that we hard code in a knowledge base into a system to make it intelligent.
diff --git a/docs/L1Norm.md b/docs/L1Norm.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 2
 
-
-
 **Definition:** L1 norm is computed as described by [Norm](Norm.md) and represents the sum of all coordinates of a given vector.
 
 This is also referred to as the taxicab norm because if we think about the distances it would take to reach a given point by only going in a straight line, this number is the L1 norm.
diff --git a/docs/L2Norm.md b/docs/L2Norm.md
@@ -4,6 +4,4 @@
 
 **Chapter:** 2
 
-
-
 **Definition:** L2 norm is the standard euclidean distance.
diff --git a/docs/LCM.md b/docs/LCM.md
@@ -2,6 +2,4 @@
 
 U 2.4
 
-
-
 **Definition:** LCM is the least common multiple of two numbers meaning it is the smallest number that is divisible by both values.
diff --git a/docs/LLE.md b/docs/LLE.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** LLE is a dimensionality reduction technique that uses manifold learning instead of projection.
 
 LLE works by finding the distance between an instance and its nearest neighbors and then lookking for a low-dimensional representation of the training set where these relationships are best preserved. 
diff --git a/docs/LUDecomposition.md b/docs/LUDecomposition.md
@@ -4,8 +4,6 @@
 
 **Lecture:** 4
 
-
-
 **Definition:** LU decomposition is the process of decomposing a matrix into an upper triangular matrix and a lower triangular matrix.
 
 LU decomposition is stated as the equation A = LU where L is the lower triangular and U is the upper triangular.
diff --git a/docs/LamportSignature.md b/docs/LamportSignature.md
@@ -66,7 +66,6 @@ for idx in range(hash):
 
 ## Implementation
 
-
 ```go
 
 package main
diff --git a/docs/Language.md b/docs/Language.md
@@ -4,8 +4,6 @@
 
 **Lecture:** 2
 
-
-
 **Definition:** A language (alphabet) is a finite set of symbols.
 
 Examples:
diff --git a/docs/LassoRegression.md b/docs/LassoRegression.md
@@ -2,8 +2,6 @@
 
 ML D3
 
-
-
 **Definition:** Lasso regression is another form of linear regression that adds a regularization term to the loss function but weights it different than ridge regression.
 
 The main difference between this and ridge is that ridge scales coeficients consistently whereas this does not. As such, often it outputs a sparse model which scales certain coeficcients to 0.
diff --git a/docs/LawOfCosines.md b/docs/LawOfCosines.md
@@ -2,8 +2,6 @@
 
 SS
 
-
-
 **Definition:** The law of cosines is defined as c^2 = a^2 + b^2 - 2ab cos(C) where a and b are side lengths and c is the side length to be found that is opposite of the angle C.
 
 When using the law of cosines it is important to note we are finding the side length opposite of the known angle. 
diff --git a/docs/LawOfDetachment.md b/docs/LawOfDetachment.md
@@ -2,8 +2,6 @@
 
 U 1.6.1
 
-
-
 **Definition:** The law of detachment is a law that specifies a form that valid arguments can take.
 
 This form is $(p \wedge (p\to q) \to q$.
diff --git a/docs/LeakyReLU.md b/docs/LeakyReLU.md
@@ -2,8 +2,6 @@
 
 ML P554
 
-
-
 **Definition:** Leaky ReLU is a variant of ReLU designed to solve the problem of neurons dying due to the use of ReLU.
 
 Leaky ReLU adds a small (or larger) slope to the function representing values less than 0 for the activation function. This ensures neurons don't die, but they can enter long coma phases.
diff --git a/docs/LearningRate.md b/docs/LearningRate.md
@@ -2,11 +2,8 @@
 
 ML L2
 
-
-
 **Definition:** The learning rate is a constant used to narrow in upon some value based on it's distance from an expected value. The further away from the value, the larger the change for a parameter(s) will be.
 
-
 See [GradientDescentCode](GradientDescentCode.md) and [[GradientDescent.md]] for an example of when a learning rate would be used and an implementation of it.
 
 Additionally, learning rate in a higher level sense, with regard to online learning, is how quickly a model will adapt to new data.
diff --git a/docs/LexicographicOrdering.md b/docs/LexicographicOrdering.md
@@ -2,8 +2,6 @@
 
 Ch 9.6
 
-
-
 **Definition:** Lexicographic ordering is the same as alphabetic ordering.
 
 Consider the case of (1, 1, 100), (1,4), (2,1), (2,2), (2,0)
diff --git a/docs/Lighting.md b/docs/Lighting.md
@@ -2,8 +2,6 @@
 
 CS 331 W12 L3
 
-
-
 ### Light Options
 
 1. Point Light Source
diff --git a/docs/LinearCombination.md b/docs/LinearCombination.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 2
 
-
-
 ### In Linear Algebra
 
 **Definition:** A linear combination is ca + db for any scalars c and d where a and b are vectors.
diff --git a/docs/LinearCongruence.md b/docs/LinearCongruence.md
@@ -2,6 +2,4 @@
 
 Ch 2.4
 
-
-
 **Definition:** A linear congruence is a congruence of the form ax \equiv b (mod c) where a,b,c are integers and x is a variable.
diff --git a/docs/LinearEquations.md b/docs/LinearEquations.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** Linear equations are equations of the form y = mx+b where m and b are real coefficients. 
 
 Simply, linear equations are any equation that results in a line when graphed. 
diff --git a/docs/LinearHomogeneousRecurrenceRelation.md b/docs/LinearHomogeneousRecurrenceRelation.md
@@ -2,8 +2,6 @@
 
 Ch 8.2
 
-
-
 **Definition:** A linear homogeneous recurrence relation is a recurrence relation where each element is a linear combination of k prior elements (degree k).
 
 Example of k degree LHRR:
diff --git a/docs/LinearIndependence.md b/docs/LinearIndependence.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 2
 
-
-
 **Definition:** Linear independence means that every column in a given matrix gives another degree of freedom. 
 
 This can also be thought of as there only being one way to make every vector with linear combinations of vectors in the span.
@@ -25,12 +23,10 @@ Given that [2,3] and [7,8] consumes all of R^2, we know there are no degrees of 
 
 If c_1*a + c_2*b = 0 is true for some constants c_1 and c_2 then we have dependence assuming at least one coeficcient is not zero. This is true for an arbitrary number of vectors and constants. If this is only possible with coeficcients that are equal to zero then we have independence.
 
-
 ### Intuitive Definition
 
 Linear independence means each vector in a set of vectors (possibly matrix) adds something to the matrix such that the [Span](Span.md) of the set of vectors is larger.
 
-
 ### Solving
 
 A simple way to solve this is using our knowledge that c_1 * a + ... + c_n * z = I where I is the identity matrix. Knowing this, we can create an augmented matrix that represents this information as follows:
diff --git a/docs/LinearMaps.md b/docs/LinearMaps.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 3
 
-
-
 **Definition:** A linear map is a function f : V -> W where V and W are vector spaces, that has the following properties:
 
 1. Additivity - T(u + v) = Tu + Tv
diff --git a/docs/LinearProbing.md b/docs/LinearProbing.md
@@ -2,8 +2,6 @@
 
 Ch 5
 
-
-
 **Definition:** Linear probing is a probing (open addressing) strategy that selects the next open index to place any objects that experienced a collission.
 
 The problem with linear probing is clustering. This is the process whereby elements that have collided grow into larger groupings such that the probability of the element after the cluster being selected is far higher than other elements in the array. This is problematic because we want our hashtable to be uniformly distributed. This is a problem that can often be solved by using quadratic probing.
diff --git a/docs/LinearRegression.md b/docs/LinearRegression.md
@@ -2,8 +2,6 @@
 
 ML L2 - Also referred to as ordinary least squares
 
-
-
 **Definition:** Fitting a straight line to data which allows for arbitrary inputs in the valid domain but not necessarily in the training set, to get accurate outputs.
 
 The goal is to find a $\theta$ (parameters) that minimizes $J(\theta)=\frac{1}{2}\sum_{i=1}^{m}(h(x_i) - y_i)^2$. This is called the cost function.
diff --git a/docs/LinearSubspace.md b/docs/LinearSubspace.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** A linear subspace is a subset (inclusive of the subset being the entire set) of a space of equal or greater cardinality where the linear subspace contains the zero vector.
 
 Things like a plane that passes through the origin in R^n, a line that passes through the origin in R^n, or R^n itself are all specific linear subspaces (or just subspaces for short).
diff --git a/docs/LinearTransformation.md b/docs/LinearTransformation.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** A linear transformation is a function with an input and output vector that respects addition and scalar multiplication.
 
 ## Formally
diff --git a/docs/Linearithmic.md b/docs/Linearithmic.md
@@ -2,6 +2,4 @@
 
 Ch 2
 
-
-
 **Definition:** Linearithmic time complexity (or linear log or just n log n) is a commonly used name to describe n log n time complexity. 
diff --git a/docs/LoadFactor.md b/docs/LoadFactor.md
@@ -2,6 +2,4 @@
 
 Ch 5
 
-
-
 **Definition:** The load factor of a hashtable is the percentage of the underlying array that is full.
diff --git a/docs/LocalScale.md b/docs/LocalScale.md
@@ -2,8 +2,6 @@
 
 CS331 W12 L2
 
-
-
 Member of transform class that can be assigned. This affects the local scale of the GameObject.
 
 See [Rotate](Rotate.md) for rotating based on local rotation and [[Translate.md]] for moving based on local coordinates. 
diff --git a/docs/LogarithmicDifferentiation.md b/docs/LogarithmicDifferentiation.md
@@ -2,8 +2,6 @@
 
 Leonard
 
-
-
 **Definition:** Logarithmic differentiation is the process of applying logs to both sides of an equation to aid in our ability to find their derivative.
 
 Steps:
diff --git a/docs/Loop.md b/docs/Loop.md
@@ -2,6 +2,4 @@
 
 Ch 4
 
-
-
 **Definition:** A loop in a graph is a connection to one's self.
diff --git a/docs/LoopInvariant.md b/docs/LoopInvariant.md
@@ -2,8 +2,6 @@
 
 CLRS 2.1
 
-
-
 **Definition:** A loop invariant is a condition that is true before and after a loop is ran.
 
 In the case of insertion sort the loop invariant is that [0 : p] is sorted where p is the number of prior iterations (prior elements sorted). See [InsertionSort](InsertionSort.md) to understand this better.
diff --git a/docs/LossFunction.md b/docs/LossFunction.md
@@ -2,8 +2,6 @@
 
 Ch 1
 
-
-
 **Definition:** A loss function is a function from E -> R where E is the set of all events (outcomes) and R is the set of all real numbers where the function describes how bad a given event E is.
 
 When I say 'event' this is in the most general of senses. In the case of RL this could simply be a state and in supervised learning this could be a prediction based on a sample.
diff --git a/docs/Lvalue.md b/docs/Lvalue.md
@@ -2,8 +2,6 @@
 
 cs202 W14 L16
 
-
-
 **Definition:**  An lvalue is a value that is not temporary and cannot be moved.
 
 An example of an lvalue is as follows:
diff --git a/docs/MAE.md b/docs/MAE.md
@@ -2,8 +2,6 @@
 
 ML CH2
 
-
-
 **Definition:** MAE also known as average absolute deviation or mean absolute error is an error metric used to describe the accuracy of a model by taking the difference between the inference and actual values of a set of samples and averaging the value.
 
 This is sometimes used when there are many outliers which can largely effect the [RMSE](RMSE.md) error metric because of the way it weights deviations.
diff --git a/docs/MCTS.md b/docs/MCTS.md
@@ -2,6 +2,4 @@
 
 ML SS
 
-
-
 **Definition:** 
diff --git a/docs/MLP.md b/docs/MLP.md
@@ -1,8 +1,6 @@
 
 ML D6
 
-
-
 **Definition:** Multilayer perceptrons are a form of deep neural network that are a feedforward process where each output goes forward to the next layer of perceptrons until reaching the output layer. This is a subset of neural networks as not all NNs are fully connected like RNNs/CNNs.
 
 MLPs can do regression and classification tasks. For regression we need one output for each output feature we would like to predict. With these outputs we can also apply an activation function (default is none), to bound the output range.
diff --git a/docs/MUX.md b/docs/MUX.md
@@ -2,6 +2,4 @@
 
 CA L3
 
-
-
 **Definition:** A MUX is a multiplexer which allows multiple inputs and selects one to be the output. This is also known as a data selector.
diff --git a/docs/ManifoldLearning.md b/docs/ManifoldLearning.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** Manifold learning is the process of mapping a higher dimensional object to a lower dimensional manifold.
 
 Manifolds are representations of objects in higher dimensional space using lower dimensional space such that they still maintain attributes. This can be thought of like uv wrapping.
diff --git a/docs/MarginalProbabilities.md b/docs/MarginalProbabilities.md
@@ -2,6 +2,4 @@
 
 Stats L2
 
-
-
 **Definition:** Marginal probabilities are probabilities that are not conditional upon any other probabilities.
diff --git a/docs/MarkovAssumption.md b/docs/MarkovAssumption.md
@@ -2,6 +2,4 @@
 
 L1
 
-
-
 **Definition:** The Markov assumption is the assumption that prior events don't matter and all necessary information that dictates the future is in the current state.
diff --git a/docs/MarkovChains.md b/docs/MarkovChains.md
@@ -50,7 +50,6 @@ To calculate this we can find the probability of each transition for all steps u
 
 Alternatively, we can use recursive approach to find the probability of each transition that connects to the state i.
 
-
 #### Steady State (Convergence)
 
 The steady state of a markov chain is the constant probability of some given state after an arbitrarily long period of time. This can be thought of the limit as n approaches infinity. If there is not convergence then there is not a steady state.
diff --git a/docs/MarkovDecisionProcesses.md b/docs/MarkovDecisionProcesses.md
@@ -2,8 +2,6 @@
 
 RL Ch 1
 
-
-
 **Definition:** Markov decision processes describe an environment for reinforcement learning.
 
 MDPs are like MRPs except they also have a finite set of actions (action space).
diff --git a/docs/MarkovInequality.md b/docs/MarkovInequality.md
@@ -2,6 +2,4 @@
 
 L19
 
-
-
 **Definition:** The Markov inequality gives the probability that a random variable is greater than or equal to some constant. 
diff --git a/docs/MarkovRewardProcess.md b/docs/MarkovRewardProcess.md
@@ -2,6 +2,4 @@
 
 L2
 
-
-
 **Definition:** A markov reward process is a markov chain with reward values associated with states or transitions.
diff --git a/docs/MathConceptsCS331.md b/docs/MathConceptsCS331.md
@@ -2,7 +2,5 @@
 
 Math Relating to CS331.
 
-
-
 [Dot Product](DotProduct.md)
 [Determinant](Determinant.md)
diff --git a/docs/Matrix.md b/docs/Matrix.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition** A matrix is a 2d grid of numerical values.
 
 Matricies can be used to describe systems of equations as follows:
@@ -16,7 +14,6 @@ Matricies can be used to describe systems of equations as follows:
 
 This is the form because we distribute x on the first column and y on the second column. 
 
-
 ## Matrix Vector Product
 
 The product of a matrix and a vector is another vector of the same size as the original vector. This assumes the number of components in the vector is the same as the number of columns in the matrix.
diff --git a/docs/MaxNorm.md b/docs/MaxNorm.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 2
 
-
-
 **Definition:** Max norm is denoated as L^inf and returns the largest coordinate value of a given vector.
 
 Given how max norm works, it is sometimes simply stated as max(v) where v is a vector.
diff --git a/docs/MaxNormRegularization.md b/docs/MaxNormRegularization.md
@@ -2,6 +2,4 @@
 
 ML P612
 
-
-
 **Definition:** Max-norm regularization is a regularization technique for neural networks that limits the combination (euclidean norm) of all incoming weights to a predefined range. If a step goes beyond this the weights are scaled accordingly to ensure compliance. 
diff --git a/docs/MaxPooling.md b/docs/MaxPooling.md
@@ -2,8 +2,6 @@
 
 ML SS
 
-
-
 **Definition:** Max pooling is a processing technique whereby a pool size is selected (2x2 as an example) and the values in the pool are compressed into one value.
 
 This is a technique to both reduce computational complexity of a model and to extract higher level features.
diff --git a/docs/Memory.md b/docs/Memory.md
@@ -2,8 +2,6 @@
 
 Memory information from computer architecture course
 
-
-
 Memory performance can affect compute speed of multiple applications running concurrently. This results in poorer performance for one despite having the clocks needed to computer correctly (denial of memory). Using nice does not change this which is the priority system for OSes. This is being caused by the DRAM memory controller being shared and thus causing a bottleneck. 
 
 ## Links
diff --git a/docs/MemoryManagement.md b/docs/MemoryManagement.md
@@ -2,8 +2,6 @@
 
 Memory management CS 202 ~W10 C++
 
-
-
 Memory management in C++ is done using a few keywords shown below
 
 **delete:** The delete keyword deallocates the memory associated with an object on the heap. 
@@ -22,10 +20,8 @@ Memory management in C++ is done using a few keywords shown below
 
 ```
 
-
 **Referencing:** This is done by the & (see above as this is quite simple).
 
-
 **Dereferencing:** This is done by the * character. This gives access to the underlying values. This can be used to both assign the underlying variable(s) and also to assign other things to them. Below is an example:
 
 ```cpp
diff --git a/docs/MergeSort.md b/docs/MergeSort.md
@@ -2,8 +2,6 @@
 
 CLRS 2.3
 
-
-
 **Definition:** Merge sort is an algoritmh that uses [DivideAndConquer](DivideAndConquer.md) to sort a list in log linear (n log(n)) time.
 
 Sample Implementation:
diff --git a/docs/MersennePrime.md b/docs/MersennePrime.md
@@ -2,8 +2,6 @@
 
 U 2.4
 
-
-
 **Definition:** A mersenne prime is a prime number of the form (2^n) - 1. 
 
 The largest prime numbers found have been prime numbers of this form.
diff --git a/docs/Mesh.md b/docs/Mesh.md
@@ -2,8 +2,6 @@
 
 CS 331 W11 L2
 
-
-
 **Definition:** A mesh is a representational grid of an object's surface used in [SurfaceRepresentation](SurfaceRepresentation.md)
 
 Think of a fishing net. We have straight lines that subdivide the point by calculating regular intervals and exact points at those intervals. This gives the illusion of continuous surfaces, but is actually a discrete set of points. 
diff --git a/docs/MeshFilter.md b/docs/MeshFilter.md
@@ -2,6 +2,4 @@
 
 [[Unity]] game engine component 
 
-
-
 The mesh filter sets the shape of an object. Without a renderer, this does nothing, but this gives the general dimensions of the object (not scale though).
diff --git a/docs/MeshRenderer.md b/docs/MeshRenderer.md
@@ -2,6 +2,4 @@
 
 [[Unity]] Component. 
 
-
-
 A mesh renderer is the component that assigns a material to an object. This does not have shape just material. The default is the magenta material. 
diff --git a/docs/MicroArchitecture.md b/docs/MicroArchitecture.md
@@ -2,8 +2,6 @@
 
 Computer Architecture L2
 
-
-
 **Definition:** The implementation of an agreed upon ISA. These are the underlying mechanics that are not exposed to the OS/System developer.
 
 There are many micro architecture implementations of each ISA, but very few different ISAs because changes to ISAs breaks compatibility. 
diff --git a/docs/Microcontroller.md b/docs/Microcontroller.md
@@ -2,8 +2,6 @@
 
 W2
 
-
-
 **Definition:** A microcontroller consists of a cpu, integrated memory, and the ability to use external memory.
 
 A microcontroller is a fully functional computer whereas a [Microprocessor](Microprocessor.md) is simply the CPU.
diff --git a/docs/Microprocessor.md b/docs/Microprocessor.md
@@ -2,8 +2,6 @@
 
 W2
 
-
-
 **Definition:** A microprocessor is simply a processor by itself.
 
 This does not include memory or anything else with it.
diff --git a/docs/MillerRabinAlgorithm.md b/docs/MillerRabinAlgorithm.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 2.6
 
-
-
 **Definition:** The Miller-Rabin Algorithm uses the knowledge that if $n$ is prime then either the first element in the list of residues modulo $n$ equals 1; or some element in the list equals ($n-1$); otherwise $n$ is composite. This only guarantees a number is likely prime because this is necessary but not sufficient.
 
 ## Background
diff --git a/docs/MinMaxScaling.md b/docs/MinMaxScaling.md
@@ -2,8 +2,6 @@
 
 ML CH2
 
-
-
 **Definition:** Min-max scaling also referred to as normalization is a shift from the current values to between two arbitrary values. 
 
 These two bounds are normally either 0 and 1 or -1 and 1. It is optimal for neural networks to have zero mean inputs so a range from -1 to 1 is generally good.
diff --git a/docs/MinusOneTrick.md b/docs/MinusOneTrick.md
@@ -2,8 +2,6 @@
 
 Ch 2.2
 
-
-
 **Definition:** The minus one trick is a method used to find general solutions to a system of equations by making a rectangular matrix a square matrix and adding -1 into each position along the diagnal that is not 1.
 
 By doing this we can then simply read out the general solution to the matrix instead of having to derive it.
diff --git a/docs/MixedGraph.md b/docs/MixedGraph.md
@@ -2,6 +2,4 @@
 
 10.1
 
-
-
 **Definition:** A mixed graph is a graph that allows directed and undirected edges, loops, and multi-edges.
diff --git a/docs/Mod.md b/docs/Mod.md
@@ -2,8 +2,6 @@
 
 U 2.4
 
-
-
 **Definition:** Mod is a mathematical function where we find the value 0 <= n < a such that a = bk + n for some integer b. 
 
 Generally, this is normally used only for integers, but there is not anything prohibitive about using it on R so long as b is an integer.
diff --git a/docs/Model.md b/docs/Model.md
@@ -2,8 +2,6 @@
 
 RL Ch 1
 
-
-
 **Definition:** A model in RL is an agents representation of its environment that allows it to predict expected outcomes.
 
 There are two parts to the model:
diff --git a/docs/ModelBasedLearning.md b/docs/ModelBasedLearning.md
@@ -2,8 +2,6 @@
 
 ML CH1
 
-
-
 **Definition:** Model based learning takes in inputs, does predictions, and gives an output. 
 
 This is different than [InstanceBasedLearning](InstanceBasedLearning.md) because it tries to learn patterns instead of match them.
diff --git a/docs/ModelFree.md b/docs/ModelFree.md
@@ -2,6 +2,4 @@
 
 L1
 
-
-
 **Definition:** A model free approach in RL means the agent does not know or estimate probabilities of state transitions and as such learns directly from experience.
diff --git a/docs/Momentum.md b/docs/Momentum.md
@@ -2,8 +2,6 @@
 
 ML P580
 
-
-
 **Definition:** Momentum optimization is an optimization algorithm that uses the idea of momentum to reach an optimum faster.
 
 As we continue to have a negative gradient the optimizer moves faster and faster until it inverts where it then begins to slow down the gradient steps and subsequently change directions.
diff --git a/docs/MonoBehaviour.md b/docs/MonoBehaviour.md
@@ -2,8 +2,6 @@
 
 CS 331 W12 L3
 
-
-
 **Definition:** Monobehaviour is the default inherited class for scripts which contains start and update. 
 
 Each script contains code for a class that inherits from monobehaviour. 
diff --git a/docs/MonotonicFunction.md b/docs/MonotonicFunction.md
@@ -2,8 +2,6 @@
 
 Stats
 
-
-
 **Definition:** A monotonically increasing function is one where as the input increases the output either stays the same or increases. The inverse is also true with a monotonically decreasing function. The statement of monotonicity simply means always increasing or decreasing.
 
 Another variant upon monotonic functions is the strictly function which is always moving in some direction and never stagnates. An example is f(x) = x where the function is strictly increasing for all reals.
diff --git a/docs/MonteCarloLearning.md b/docs/MonteCarloLearning.md
@@ -2,8 +2,6 @@
 
 L4
 
-
-
 **Definition:** Monte Carlo learning is a learning method that uses episodes and averages their returns to optimize policies.
 
 First Visit - First visit Monte Carlo learning we only increment the counter for the current state if it is the first visit to that state in the given episode.
diff --git a/docs/MonteCarloMethod.md b/docs/MonteCarloMethod.md
@@ -2,8 +2,6 @@
 
 SS
 
-
-
 **Definition:** The monte carlo method is a class of algorithms that use repeated random sampling to converge upon a solution to a problem where there may be a true solution, but are too complex to analyze.
 
 An example of this is the calculation of PI using random sampling of a 2D grid to find the approximate area of a circle with a radius of 1.
diff --git a/docs/MooresLaw.md b/docs/MooresLaw.md
@@ -2,8 +2,6 @@
 
 Computer architecture L2.
 
-
-
 **Definition:**Component counts double every other year.
 
 This was found by examining a log base 2 function to find that the function was linear and as such the underlying function is exponential (x^2). 
diff --git a/docs/MosaicPlot.md b/docs/MosaicPlot.md
@@ -2,6 +2,4 @@
 
 Stats D4
 
-
-
 **Definition:** A mosaic plot is a plot that shows cross tabulated information in a graphical way where each box is sized according to the actual values of the classes associated with the given position.
diff --git a/docs/Movement.md b/docs/Movement.md
@@ -2,8 +2,6 @@
 
 CS 331 W12 L1
 
-
-
 There are many different ways to implement movement.
 
 A common way to move is by doing this:
@@ -25,7 +23,6 @@ if(rotateLeft){
 ```
 The issue with this is movement may not feel natural because there is no acceleration being applied to the object you are just moving it by a certain amount. In essence, you are assigning a velocity to the object for the frames where the "up" key is pressed.
 
-
 See [Input](Input.md) for more information about the Input class. 
 
 See [Vector3](Vector3.md) for more information about positions.
diff --git a/docs/MultiValuedFunction.md b/docs/MultiValuedFunction.md
@@ -2,8 +2,6 @@
 
 Ch 0
 
-
-
 **Definition:** Multivalued functions are functions such that there exists two or more values in the codomain for at least one value in the domain. 
 
 These are not strictly functions, unless specified, as functions must map to only one value in the codomain for each value in the domain.
diff --git a/docs/Multigraph.md b/docs/Multigraph.md
@@ -2,6 +2,4 @@
 
 Ch 4
 
-
-
 **Definition:** A multi-graph is a graph that can contain multiple edges to the same node.
diff --git a/docs/MultinomialCoefficient.md b/docs/MultinomialCoefficient.md
@@ -2,8 +2,6 @@
 
 Ch 1.3
 
-
-
 **Definition:** A multinomial coefficient is a form of binomial coefficient where the bottom of the binomial coefficient is multiple numbers.
 
 Calculating the multinomial is quite simple. Assume we have 3,3,4 on the bottom and 10 on the top. We can find the multinomial coefficient by finding the following:
diff --git a/docs/MultioutputClassification.md b/docs/MultioutputClassification.md
@@ -2,6 +2,4 @@
 
 ML D2
 
-
-
 **Definition:** Multioutput classification is a type of multilabel classification where each output can be multiple classes. 
diff --git a/docs/Multiset.md b/docs/Multiset.md
@@ -2,8 +2,6 @@
 
 U2.2.5
 
-
-
 **Definition:** A multiset is an unordered collection that can contain multiple instances of the same object.
 
 Multiset is short for multiple-membership set.
diff --git a/docs/MutuallyIndependent.md b/docs/MutuallyIndependent.md
@@ -2,6 +2,4 @@
 
 Ch 1.4
 
-
-
 **Definition:** A set of mutually independent events is a set such that all conditional probabilities (any combination) are equivalent to the unconditioned probabilities.
diff --git a/docs/NAG.md b/docs/NAG.md
@@ -2,6 +2,4 @@
 
 ML P582
 
-
-
 **Definition:** NAG is an improvment upon the momentum optimization algorithm where instead of finding the gradient of the current position and adding this to the velocity, we instead find the gradient slightly ahead (in direction of momentum) and then add this factor to the velocity.
diff --git a/docs/NLP.md b/docs/NLP.md
@@ -2,6 +2,4 @@
 
 ML Book CH1
 
-
-
 **Definition:** NLP is the acronym for natural language processing. This is the process of taking in language data (written, audible, or some other form), and doing something with it. This may be classification or something else.
diff --git a/docs/NPComplete.md b/docs/NPComplete.md
@@ -2,6 +2,4 @@
 
 U 2.3
 
-
-
 **Definition:** NP complete problems are a set of problems of the NP family such that if any of them are found to be solvable in polynomial time then P=NP.
diff --git a/docs/NPProblem.md b/docs/NPProblem.md
@@ -2,6 +2,4 @@
 
 U 2.3
 
-
-
 **Definition:** An NP problem (non-deterministic polynomial) is a problem that can be verified in polynomial time but is not (believed to be) solvable in polynomial time.
diff --git a/docs/NaiveBayes.md b/docs/NaiveBayes.md
@@ -2,8 +2,6 @@
 
 ML SS
 
-
-
 **Definition:** Naive Bayes is an algorithm used to find the probabilities of text being part of a given class. 
 
 This is often used for spam classification. Here are the steps:
diff --git a/docs/NaturalLog.md b/docs/NaturalLog.md
@@ -1,10 +1,7 @@
 # Natural Log
 
-
-
 **Definition:** The natural log (ln) is an expression stating the output of the logarithm function is the value such that e^x is equal to the value being taken as the input.
 
-
 When working with ln we have the following options for algebraic manipulations:
 
 1. Division becomes subtraction
diff --git a/docs/Negation.md b/docs/Negation.md
@@ -2,6 +2,4 @@
 
 1.1.1
 
-
-
 **Definition:** Negation is the process of inverting the truthiness of a proposition.
diff --git a/docs/NestedQuantifier.md b/docs/NestedQuantifier.md
@@ -2,8 +2,6 @@
 
 U 1.5.1
 
-
-
 **Definition:** Nested quantifiers are when there are multiple quantifiers in the same scope.
 
 Example:
diff --git a/docs/NetworkSecurity.md b/docs/NetworkSecurity.md
@@ -4,6 +4,4 @@
 
 **Chapter:** 1.1
 
-
-
 **Definition:** Protection of networks and their service.
diff --git a/docs/NeuralNetworks.md b/docs/NeuralNetworks.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** Artificial neural networks are machine learning models that mimick biological neurons to complete some task.
 
 ReLU activations can be used on output layers to force the output to be positive. Additionally, we can use softplus which is relu but smooth to set output values because by default there is not an activation function for the output layer.
diff --git a/docs/NonDeterministicFiniteAutomata.md b/docs/NonDeterministicFiniteAutomata.md
@@ -10,7 +10,6 @@
 2. It is possible for a NFA to have a state transition with a label of epsilon, indicating such a transition has no impact upon the current word.
 3. An NFA may have a state with multiple transitions for the same symbol.
 
-
 ---
 
 As we can see, an all DFAs can be considered NFSs, but the opposite is not true.
diff --git a/docs/NonRepudation.md b/docs/NonRepudation.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1.1
 
-
-
 **Definition:** Non-repudation means that it is irrefutable an action was performed by an individual.
 
 An example of this is tagging every modification to data such that individuals can not deny it was them who changed it.
diff --git a/docs/Norm.md b/docs/Norm.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 2
 
-
-
 **Definition:** Norm is a function defined as follows:
 
 ||v_p|| = sum(|v_i|^p)^1/p where p > 0
diff --git a/docs/NormalDistribution.md b/docs/NormalDistribution.md
@@ -2,8 +2,6 @@
 
 Stats D1 + Prob L8
 
-
-
 **Definition:** A normal distribution is a unimodal one in which most observations cluster around the mound while fewer and fewer observations are farther away. 
 
 With normal distributions we often refer to them in regard to the standard normal distribution which is the normal distribution defined as the distribution centerd about 0 with a std deviation of 1. This is conveninent to project other graphs onto given that normal distributions don't have a percentile calculation in the closed form thus we use lookup tables.
diff --git a/docs/NormalVector.md b/docs/NormalVector.md
@@ -2,6 +2,4 @@
 
 Khan
 
-
-
 **Definition:** The normal vector of a hyperplane is a vector that is orthogonal to the hyperplane (there are infinitely many as this is simply a direction and the magnitude does not matter unless specifying unit normal vector).
diff --git a/docs/NoveltyDetection.md b/docs/NoveltyDetection.md
@@ -2,8 +2,6 @@
 
 ML CH1
 
-
-
 **Definition:** Novelty detection is used to detect new samples that appear different from other instances in the training set.
 
 This is similar to [AnomalyDetection](AnomalyDetection.md)
diff --git a/docs/NullSpace.md b/docs/NullSpace.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** The null space of matrix A is the set of vectors {$\vec{b} \in \R^n | \space \vec{b} \cdot A=\vec{0}$}.
 
 These are all of the vectors that when multiplied by the matrix are equivalent to the zero vector. This is a closed ([Closure](Closure.md)) [Subspace](Subspace.md).
diff --git a/docs/NumberTheory.md b/docs/NumberTheory.md
@@ -2,6 +2,4 @@
 
 U 2.4
 
-
-
 **Definition:** Number theory is a branch of mathematics that concerns itself with properties and functions on integers.
diff --git a/docs/OSI.md b/docs/OSI.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1.2
 
-
-
 **Definition:** OSI is an accepted standard for networking and security.
 
 Shown below are the security focuses of the OSI model.
diff --git a/docs/OffPolicyLearning.md b/docs/OffPolicyLearning.md
@@ -2,6 +2,4 @@
 
 L5
 
-
-
 **Definition:** Off policy learning can be thought of as looking over someone else's shoulder to understand what will and will not result in high rewards.
diff --git a/docs/OneHotEncoding.md b/docs/OneHotEncoding.md
@@ -2,8 +2,6 @@
 
 ML CH2
 
-
-
 **Definition:** One hot encoding is the process of taking all unique features of a given feature and expanding these out to be individual boolean attributes of a sample. 
 
 An example of this is if you have a column that states the distance from the ocean. The options are island, 1 hour, and near ocean. These could be encoded as integers, but the issue is that these value are not representative of what the values mean thus mapping this to a linear regression would cause issues because higher or lower does not necessarily mean better. As such, you would then add 1 hour, near ocean, and island as columns and then set booleans as true or false based on the distance string. 
diff --git a/docs/OneVersusAll.md b/docs/OneVersusAll.md
@@ -2,8 +2,6 @@
 
 ML D2
 
-
-
 **Definition:** One versus all classifiers are a sequence of binary classifiers that output probabilities where the highest probability is then selected as the output. 
 
 Think of this as a series of SVC or SGD classifiers that output some likelihood that the current input is part of a particular class. You then send the input into each model and whichever one outputs the highest probability is the class that the input belongs to. 
diff --git a/docs/OneVersusOne.md b/docs/OneVersusOne.md
@@ -2,8 +2,6 @@
 
 ML D2
 
-
-
 **Definition:** A one versus one classification strategy trains binary classifiers to output the probability of an input being part of one class or another. 
 
 Basically, you train a model to compare between one set and another. It outputs the probability of one output over the other. Then by doing these comparisons whichever class wins with the most classifiers the input is part of that class(in theory).
diff --git a/docs/OnesComplement.md b/docs/OnesComplement.md
@@ -1,8 +1,6 @@
 
 Self Study
 
-
-
 **Definition:** One's complement is an implementation of signed values such that a 1 in the MSB position indicates the number is negative.
 
 This is not used today because it required extra computation for mathematical operations, and it has both a positive and a negative zero which is a waste.
diff --git a/docs/OnlineLearning.md b/docs/OnlineLearning.md
@@ -1,8 +1,6 @@
 
 ML CH1
 
-
-
 **Definition:** Online learning is the process of learning as a model is fed new data.
 
 This paradigm is in contrast with [OfflineLearning](OfflineLearning.md) also known as batch learning where all data is trained on at the start and then the learned behavior is acted upon in a static way in perpetuity. 
diff --git a/docs/Opcode.md b/docs/Opcode.md
@@ -2,8 +2,6 @@
 
 CA L3
 
-
-
 **Definition:** An opcode is the first part of an [Instruction](Instruction.md) which describes what the instruction does. 
 
 This is a form of [BitSteering](BitSteering.md)
diff --git a/docs/OpenAddressing.md b/docs/OpenAddressing.md
@@ -2,6 +2,4 @@
 
 L4
 
-
-
 **Definition:** Open addressing is the process of resolving collisions by probing for the next available location in a predefined manor to remove the need to resolve collisions with another data structure.
diff --git a/docs/Operands.md b/docs/Operands.md
@@ -2,8 +2,6 @@
 
 CA L3
 
-
-
 **Definition:** Operands describe who an [Instruction](Instruction.md) should be done to. 
 
 See [Opcode](Opcode.md) for the other part of an instruction. 
diff --git a/docs/OperatorNotation.md b/docs/OperatorNotation.md
@@ -2,6 +2,4 @@
 
 Ch 0
 
-
-
 **Definition:** Operator notation is a way to define tasks in a way that uses complex operators such as x+y to define the addition of the ordered pair (x,y).
diff --git a/docs/OptimalBayesianAgent.md b/docs/OptimalBayesianAgent.md
@@ -2,6 +2,4 @@
 
 Superintelligence - Bostrom
 
-
-
 **Definition:** An optimal bayesian agent is an agent that at all times takes the best possible action based on probabilities and expected values to maximize some utility/cost function.
diff --git a/docs/OptimalSubstructure.md b/docs/OptimalSubstructure.md
@@ -2,6 +2,4 @@
 
 L3
 
-
-
 **Definition:** Optimal substructure is a property of problems such that an overall (optimal) solution to the problem can be derived by finding out something about subproblems.
diff --git a/docs/Optimizer.md b/docs/Optimizer.md
@@ -2,8 +2,6 @@
 
 ML P580
 
-
-
 **Definition:** An optimizer is an algorithm to adjust the weights and biases of neural networks.
 
 Here are a list of common optimizers:
diff --git a/docs/OracleComputer.md b/docs/OracleComputer.md
@@ -2,8 +2,6 @@
 
 SS
 
-
-
 **Definition:** An oracle computer is a computer that can compute any computable problem. 
 
 Such a system does not need to be possible see [Bekenstein Bound](BekensteinBound.md) for why it may not be possible.
diff --git a/docs/OrderedSample.md b/docs/OrderedSample.md
@@ -2,6 +2,4 @@
 
 CH 1.3
 
-
-
 **Definition:** An ordered sample is an outcome where the order of elements contributes to the uniqueness of the output. As such, an ordered sample is denoted using ordered pairs instead of a set as sets are innately unordered.
diff --git a/docs/OrdinaryLeastSquares.md b/docs/OrdinaryLeastSquares.md
@@ -2,8 +2,6 @@
 
 ML CH2
 
-
-
 **Definition:** Ordinary least squares is a formula used to find the statistical line of best fit for some dataset where we are trying to minimize the square error. 
 
 When doing [LinearRegression](LinearRegression.md) there are two common methods to find the line. One is OLS and the other is [[GradientDescent.md]]. 
diff --git a/docs/OrthogonalComplement.md b/docs/OrthogonalComplement.md
@@ -2,8 +2,6 @@
 
 Khan U3
 
-
-
 **Definition:** The orthogonal complement of a subspace is the subspace such that the dot product between any vectors (one from each subspace) are 0.
 
 The orthogonal complement of the subspace V in $\R^n$ is defined as follows: 
diff --git a/docs/Orthonormal.md b/docs/Orthonormal.md
@@ -2,8 +2,6 @@
 
 U3
 
-
-
 **Definition:** An orthonormal set is a set of linearly independent vectors that have been normalized (length = 1).
 
 The ortho part means that all vectors are orthogonal to each other.
diff --git a/docs/OutOfBag.md b/docs/OutOfBag.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** Out of bag refers to samples that are not contained within a training sampling for a given predictor when using bagging/pasting.
 
 It is 37% likely that when using bagging and selecting m random samples from the training set that a given sample will be out of bag. These can be useful because these values can then be used for validation of the individual predictor.
diff --git a/docs/OutOfOrderExecution.md b/docs/OutOfOrderExecution.md
@@ -2,8 +2,6 @@
 
 Computer Architecture L2
 
-
-
 **Definition:** An optimization strategy that executes commands out of order to reduce the amount of clocks/time taken to complete computations. This is complex as it can be hard to determine if a command relies upon another command that came in earlier.  
 
 See [DataFlow](DataFlow.md) for more information about out of order/non-Von Neumann computation.
diff --git a/docs/Overfitting.md b/docs/Overfitting.md
@@ -2,8 +2,6 @@
 
 ML CH1
 
-
-
 **Definition:** Overfitting is when a model is trained on data and performs well on it but lacks the ability to generalize. 
 
 Generally, this is caused by having a complex model with lots of features but not enough training samples or training samples that have too much noise. This issue can be resolved by simplifying the model (decrease features), removing noise from the samples, or increasing the number of samples.
diff --git a/docs/OverlappingSubproblems.md b/docs/OverlappingSubproblems.md
@@ -2,6 +2,4 @@
 
 L3
 
-
-
 **Definition:** Overlapping subproblems is a property of a problem such that subproblems occur again and again meaning we are being more efficient by solving these subproblems than by trying to solve the original problem.
diff --git a/docs/Oversmoothing.md b/docs/Oversmoothing.md
@@ -2,8 +2,6 @@
 
 Stats D3
 
-
-
 **Definition:** Oversmoothing is the process of making the bandwidth of a kernel too large such that resulting visualizations smooth over important information.
 
 This can be thought of as underfitting the dataset.
diff --git a/docs/PCA.md b/docs/PCA.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** PCA is a dimensionality reduction algorithm that finds a hyperplane that lies close to the data and then projects the data onto it.
 
 The goal of this algorithm is to preserve maximum variance so values in the dataset are optimally spread out.
diff --git a/docs/PairwiseIndependence.md b/docs/PairwiseIndependence.md
@@ -2,6 +2,4 @@
 
 Ch 1.4
 
-
-
 **Definition:** Pairwise independent events are two events such that the conditional probabilities of either are equivalent to the unconditioned probabilities.
diff --git a/docs/PairwiseRelativelyPrime.md b/docs/PairwiseRelativelyPrime.md
@@ -2,6 +2,4 @@
 
 U 2.4
 
-
-
 **Definition:** Pairwise relatively primes are a set of numbers such that the gcd between any two numbers in the set is always 1.
diff --git a/docs/PartialDerivative.md b/docs/PartialDerivative.md
@@ -2,8 +2,6 @@
 
 ML D2
 
-
-
 **Definition:** The partial derivative is a derivative of a multivariate function with respect to a singular variable by considering the others as constants.
 
 Often this is used in [GradientDescent](GradientDescent.md) to determine in what ways parameters need to change.
diff --git a/docs/PartiallyApplied.md b/docs/PartiallyApplied.md
@@ -8,7 +8,6 @@
 
 ## Example
 
-
 ```haskell
 module Main where
 
diff --git a/docs/PartiallyObservableMarkovDecisionProcess.md b/docs/PartiallyObservableMarkovDecisionProcess.md
@@ -2,6 +2,4 @@
 
 L1
 
-
-
 **Definition:** A partially observable markov decision process is a type of markov decision process where the agent doesn't have access to the entire current state.
diff --git a/docs/ParticularSolution.md b/docs/ParticularSolution.md
@@ -2,6 +2,4 @@
 
 Ch 2.2
 
-
-
 **Definition:** A particular solution to a set of linear equations are specific values that make all of the equalities of the system true.
diff --git a/docs/Partition.md b/docs/Partition.md
@@ -1,8 +1,6 @@
 
 AM W14 Reading
 
-
-
 **Definition:** A partition of a set A is a set of non-empty subsets of A, such that the union of all the subsets equals A, and the intersection of any two different subsets is the null set. 
 
 Basically, a partition is the subsets of a set where all subsets together make the original set and all subsets are unique in their elements where any intersection between them is the null set. Keep in mind the partition is the combination of all of them not simply a singular one of the subsets which is where this diverges from the computational term "partition".
diff --git a/docs/PascalsIdentity.md b/docs/PascalsIdentity.md
@@ -2,6 +2,4 @@
 
 Ch 6.4
 
-
-
 **Definition:** Pascal's identity is the idea that n+1 choose r is equivalent to n choose r plus n choose r-1.
diff --git a/docs/PassiveAttacks.md b/docs/PassiveAttacks.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1.3
 
-
-
 **Definition:** Passive attacks are attacks that monitor transmissions.
 
 This is synonymous with eavesdropping.
diff --git a/docs/Pasting.md b/docs/Pasting.md
@@ -2,6 +2,4 @@
 
 ML D5
 
-
-
 **Definition:** Pasting is the process of training multiple models of the same type on subsets of a dataset. This is different than bagging as pasting removes selected samples of the current subset subset from the current predictors options. This means the same predictor (model) can't be trained on the same sample twice, but different predictors may use some of the same samples. 
diff --git a/docs/Path.md b/docs/Path.md
@@ -2,6 +2,4 @@
 
 Ch 4
 
-
-
 **Definition:** A path is a sequence of adjacent nodes where nodes can not be repeated.
diff --git a/docs/Percentile.md b/docs/Percentile.md
@@ -2,6 +2,4 @@
 
 Khan
 
-
-
 **Definition:** Percentile is the percent of data that is below the specified amount or at or below the amount. 
diff --git a/docs/Perceptrons.md b/docs/Perceptrons.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** Perceptrons are an artificial neural network architecture based on threshold logic untis (TLUs) or linear threshold units (LTUs). 
 
 The inputs and outputs of these neurons are numbers and each input is associated with a weight. 
diff --git a/docs/PerfectNumbers.md b/docs/PerfectNumbers.md
@@ -2,8 +2,6 @@
 
 Math 310
 
-
-
 **Definition:** Perfect numbers are numbers such that all divisors added up are equal to the number itself. 
 
 A few examples are 28 and 6
diff --git a/docs/PeriodicChain.md b/docs/PeriodicChain.md
@@ -2,8 +2,6 @@
 
 L17
 
-
-
 **Definition:** Periodic Markov chains are a specific type of Markov chain defined as a chain with groups such that all transitions frome one group lead to the next group.
 
 Periodic Markov chains are interesting because they never achieve a steady state.
diff --git a/docs/PerlinNoise.md b/docs/PerlinNoise.md
@@ -2,8 +2,6 @@
 
 SS
 
-
-
 **Definition:** Perlin noise is a procedural gradient texture generated using the perlin noise algorithm.
 
 Not 100% about this:
diff --git a/docs/Permutation.md b/docs/Permutation.md
@@ -2,8 +2,6 @@
 
 CH 1.3
 
-
-
 **Definition:** A permutation is an arrangement of elements length n.
 
 To calculate the total number of permutations of a given list we simply find the length, denoted n, factorial.
diff --git a/docs/PermutationMatrix.md b/docs/PermutationMatrix.md
@@ -4,8 +4,6 @@
 
 **Lecture:** 2
 
-
-
 **Definition:** A permutation matrix is a matrix that when multiplied by exchanges rows of the other matrix.
 
 Permutation matrices are necessary for LU decomposition and Gaussian elimination because sometimes we find there are 0's in the pivot positions.
diff --git a/docs/Pictograph.md b/docs/Pictograph.md
@@ -2,6 +2,4 @@
 
 Khan
 
-
-
 **Definition:** A picture representation of statistics such as a chart, graph, or something else.
diff --git a/docs/PigeonholePrinciple.md b/docs/PigeonholePrinciple.md
@@ -2,6 +2,4 @@
 
 Ch 6.2
 
-
-
 **Definition:** The pigeonhole principle states that if there are n pigeons and z nests, if z is smaller than n there then must be at least one z such that z contains multiple pigeons.
diff --git a/docs/Pipelining.md b/docs/Pipelining.md
@@ -2,8 +2,6 @@
 
 CA L3
 
-
-
 **Definition:** Pipelining is the use of CPU hardware such that simultaneous execution of more than one instruction occurs at the same time. 
 
 See [OutOfOrderExecution](OutOfOrderExecution.md).
diff --git a/docs/PlaneToPlaneDistance.md b/docs/PlaneToPlaneDistance.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 See [DistanceToPlane](DistanceToPlane.md) for distance from plane to point. 
 
 This only is useful for planes that are paralell otherwise they will intersect. 
diff --git a/docs/PoissonDistribution.md b/docs/PoissonDistribution.md
@@ -2,8 +2,6 @@
 
 Stats D1
 
-
-
 **Definition:** A poisson distribution is a common distribution that gives the probability of something happening at a point in time (or position or volume) where the probability of it happening at any given time is known. 
 
 An example of this is the distribution of number of texts receieved in a day where the mean is 12 texts per day. Using this information we can then use a known formula, based on normal distributions, to find the probability that we receive 8 text in a day, less than 8 texts, or any other number of texts. 
diff --git a/docs/PolarCoordinates.md b/docs/PolarCoordinates.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1
 
-
-
 **Definition:** The polar coordinate system is a coordinate system where we define coordinates not by their distances but rather by the distance and also the angle theta made between the line segment and the origin.
 
 NORMAL SYSTEM:
@@ -36,7 +34,6 @@ def cartToPolar(x,y):
     theta = math.asin(y/r)
     print("Polar Coordinates: \t" + str(r) + " " + str(theta))
 
-
 r = 2.3
 theta = 1.38
 print("POLAR TO CART:")
diff --git a/docs/Policy.md b/docs/Policy.md
@@ -2,8 +2,6 @@
 
 RL Ch 1
 
-
-
 **Definition:** A policy in machine learning is a function from the current state to the action an agent will take.
 
 Basically, this dictates what the agent will do in a given scenario, this can also include some stochasticity (necessary for exploration).
diff --git a/docs/PoolingLayers.md b/docs/PoolingLayers.md
@@ -2,8 +2,6 @@
 
 ML P762
 
-
-
 **Definition:** Pooling layers are layers of a CNN that 'pool' together surrounding values to pass through a singular representative value.
 
 This representative value is normally either max or mean, but max has grown in favor of late.
diff --git a/docs/Postcondition.md b/docs/Postcondition.md
@@ -2,6 +2,4 @@
 
 U 1.4.1
 
-
-
 **Definition:** Postconditions are the expected outputs of a function or program which are predicated upon the specified [Preconditions](Preconditions.md).
diff --git a/docs/PowerSet.md b/docs/PowerSet.md
@@ -2,8 +2,6 @@
 
 AM Ch1
 
-
-
 **Definition:** The power set is the set of all subesets of the input set. 
 
 Example:
diff --git a/docs/Precision.md b/docs/Precision.md
@@ -2,8 +2,6 @@
 
 CH 3
 
-
-
 **Definition:** The precision of a classifier (classification model) is the accuracy of positive predictions.
 
 Here is the formula:
diff --git a/docs/Preconditions.md b/docs/Preconditions.md
@@ -2,6 +2,4 @@
 
 U 1.4.1
 
-
-
 **Definition:** Preconditions are necessarily specified inputs (or variables) to a function (or program) that are required prior to execution/evaluation. 
diff --git a/docs/Predicate.md b/docs/Predicate.md
@@ -2,8 +2,6 @@
 
 U 1.4.1
 
-
-
 **Definition:** The predicate in a mathematical context is the part of a statement that gives us a truth value when variables are at play.
 
 In the case of 'x < 2' the predicate is 'less than 2'. This can be stated as a propositional function P(x). The following are valid inputs and outputs of said function:
diff --git a/docs/Prediction.md b/docs/Prediction.md
@@ -2,8 +2,6 @@
 
 Ch2
 
-
-
 **Definition:** Prediction is the process of predicting an output given a sample.
 
 This is different than [Inference](Inference.md) which is focused on understanding relationshipts between variables.
diff --git a/docs/Preimage.md b/docs/Preimage.md
@@ -2,8 +2,6 @@
 
 Khan Unit 2
 
-
-
 **Definition:** The preimage of an image is the set of all values in the codomain such that their mappings are all in a specified image. This image may be the codomain or some other set.
 
 T^-1(S) = Preimage of S under T.
diff --git a/docs/PretrainedModels.md b/docs/PretrainedModels.md
@@ -2,8 +2,6 @@
 
 ML P570
 
-
-
 **Definition:** Pretrained models are ML models that have been trained in the past and can be used for doing other things.
 
 Pretrained models often use [TransferLearning](TransferLearning.md) because the goal with pretrained models is to use the existing model that has already been trained to work well with a new set of data. This often involves changing the model's top layers (training new ones for the specific task) while keeping the lower layers in tact as they often do simple tasks like edge detection which are reusable.
diff --git a/docs/PrimeFactorization.md b/docs/PrimeFactorization.md
@@ -2,8 +2,6 @@
 
 U 2.4
 
-
-
 **Definition:** The prime factorization of any given number is the multiplication of prime numbers that results in the number.
 
 In the case a number is prime its prime factorization would then be itself. 
diff --git a/docs/PrimeNumber.md b/docs/PrimeNumber.md
@@ -2,8 +2,6 @@
 
 U 2.4
 
-
-
 **Definition:** A prime number is a number greater than 1 such that its only divisors are itself and 1. 
 
 A common way to prove the primality of a number is to show all numbers less than its square root do not divide it (a | b == False).
diff --git a/docs/PrincipleOfInclusionExclusion.md b/docs/PrincipleOfInclusionExclusion.md
@@ -2,8 +2,6 @@
 
 Ch 8.3 Rosen
 
-
-
 **Definition:** The principle of inclusion-exclusion is a principle used to count the number of elements in the union of a finite number of sets.
 
 Consider:
diff --git a/docs/Probability.md b/docs/Probability.md
@@ -2,8 +2,6 @@
 
 Stats CH1
 
-
-
 **Definition:** The probability is the likelihood of something happening as a percentage between 0 and 1 or 0% and 100%. 
 
 Let X be a set and F a set of subsets of X. A probability on (X,F) is a function u : F -> [0,1]. This means for each set in F we have a probability between 0 and 1 for each set. See [SetFunction](SetFunction.md) for more about the u (mu greek character) function.
diff --git a/docs/ProbabilityDensityFunctions.md b/docs/ProbabilityDensityFunctions.md
@@ -2,8 +2,6 @@
 
 Stats ch1
 
-
-
 **Definition:** A probability density function shows the probability of outcomes for [ContinuousProbability](ContinuousProbability.md) problems.
 
 **Important:** PDFs are for continuous random variables whereas PMFs are for discrete.
diff --git a/docs/ProbabilityMassFunction.md b/docs/ProbabilityMassFunction.md
@@ -2,8 +2,6 @@
 
 L4
 
-
-
 **Definition:** A PMF describes the probability of some mapping of a [RandomVariables](RandomVariables.md) from inputs to a specific output. 
 
 **Important:** PMFs are for discrete random variables whereas PDFs are for continuous.
diff --git a/docs/ProductRule.md b/docs/ProductRule.md
@@ -2,6 +2,4 @@
 
 Leonard
 
-
-
 **Definition:** The product rule is used when taking the derivative of two functions that are multiplied together. The rule is as follows $\frac{d}{dx}(g(x)f(x)) = g'(x)f(x) + f'(x)g(x)$
diff --git a/docs/Prognosticator.md b/docs/Prognosticator.md
@@ -2,6 +2,4 @@
 
 Superintelligence - Bostrom
 
-
-
 **Definition:** A prognosticator is someone who tells of the future.
diff --git a/docs/ProgrammerVisibleState.md b/docs/ProgrammerVisibleState.md
@@ -2,8 +2,6 @@
 
 CA L3
 
-
-
 **Definition:** Programmer visible state is all state of program execution that is visible to programs. 
 
 This includes the program counter, registers, and memory.
diff --git a/docs/PropertyBasedTesting.md b/docs/PropertyBasedTesting.md
@@ -57,4 +57,3 @@ They found one issue in NumPy related the the Wald distribution
 
 The version in my hands is from July 2024.
 
-
diff --git a/docs/Proposition.md b/docs/Proposition.md
@@ -2,6 +2,4 @@
 
 Discrete 1.1
 
-
-
 **Definition:** A proposition is a statement that is either true or false.
diff --git a/docs/PropositionalFunction.md b/docs/PropositionalFunction.md
@@ -2,8 +2,6 @@
 
 U 1.4.1
 
-
-
 **Definition:** A propositional function is a function that takes an arbitrary number of inputs and outputs a truth value.
 
 An example of a propositional function is the function P(x) defined as 'x > 2'. This function could then be evaluated as follows:
diff --git a/docs/ProveSetEquality.md b/docs/ProveSetEquality.md
@@ -2,6 +2,4 @@
 
 AM TB Ch8
 
-
-
 To prove that two sets are equivalent (A and B), we first prove that A contains B. We then show that B also contains A thus all elements must be the same making the sets equivalent. Equivalence of sets is done using the = sign not the $\equiv$ sign.
diff --git a/docs/PseudoGraphs.md b/docs/PseudoGraphs.md
@@ -2,6 +2,4 @@
 
 Ch 10.1
 
-
-
 **Definition:** A pseudo graph is a graph that allows multi edges and loops, but is directed.
diff --git a/docs/QuadraticProbing.md b/docs/QuadraticProbing.md
@@ -2,8 +2,6 @@
 
 Ch 5
 
-
-
 **Definition:** Quadratic probing is a probing strategy where we start with the input and then alternately move right and left by successive perfect squares. 
 
 Basically, we check first e then e + 1 then then e - 1 then e + 4 then e - 4 and so on.
@@ -28,4 +26,3 @@ for(int i = 0 ; i < 10 ; ++i){
 
 ```
 
-
diff --git a/docs/Quantifiers.md b/docs/Quantifiers.md
@@ -2,8 +2,6 @@
 
 U 1.4.2
 
-
-
 **Definition:** Quantifiers are operators that describe the number of individuals in a domain that satisfy something.
 
 The two common quantifiers are:
diff --git a/docs/Quantile.md b/docs/Quantile.md
@@ -2,8 +2,6 @@
 
 Stats D3
 
-
-
 **Definition:** Quantiles are logic divisions in a dataset to classify certain information.
 
 Examples are medians which split the data into two subsets, quartiles which split it into 4 quantiles, quintiles (5), deciles (10), and percentiles (100). 
diff --git a/docs/Quaternions.md b/docs/Quaternions.md
@@ -2,8 +2,6 @@
 
 CS 331 W11 L2
 
-
-
 **Definition:** These are four values that describe something which can be stated as (x,y,z,w). In Unity, quaternions are used to describe rotations about axis.  
 
 There are names for the rotations with regard to the local coordinate system. The lean forward and backward is the pitch (rotation about x axis), the rotation around their center is the yaw (rotation about y axis think spinning in circles), and the rotation about the z axis is called the roll (think barrel rolls).  
diff --git a/docs/Queue.md b/docs/Queue.md
@@ -2,8 +2,6 @@
 
 CS202 L14 / CS303 Ch 1
 
-
-
 **Definition:** This is a datatype that works on a first in first out basis. This is often implemented using a [SinglyLinkedList](SinglyLinkedList.md) with a link to the tail (where more nodes would be added). This is also often implemented such that you add to the end and remove from the start. 
 
 enqueue: add to queue
diff --git a/docs/RCombination.md b/docs/RCombination.md
@@ -2,8 +2,6 @@
 
 Ch 6.3
 
-
-
 **Definition:** An r-Combination is a combination of length r.
 
 The function to denote r-combinations of a set length n is C(r,n). There are other ways to state it, but I prefer this. 
diff --git a/docs/RMSE.md b/docs/RMSE.md
@@ -2,8 +2,6 @@
 
 ML CH2
 
-
-
 **Definition:** This is the most common form of error measuring for regression problems where you take the difference between each inference and the actual output, square it, do this with all samples, divide by the number of samples, and then take the square root. 
 
 This is common because it weights more heavily far off inferences than slighly off inferences.
diff --git a/docs/ROC.md b/docs/ROC.md
@@ -2,8 +2,6 @@
 
 ML D3
 
-
-
 **Definition:** The ROC curve plots the rate of true positives for a dataset against the rate of false positives as the decision threshold changes.
 
 This type of graph is used to show threshold information for binary classification models.
diff --git a/docs/RPermutation.md b/docs/RPermutation.md
@@ -2,8 +2,6 @@
 
 TB 6.3
 
-
-
 **Definition:** r-Permutations are permutations that have a lenght of r.
 
 An important functions is P(n, r) where this denotes the number of r-permutations of a set with a length of n.
diff --git a/docs/RadialBasisFunction.md b/docs/RadialBasisFunction.md
@@ -2,8 +2,6 @@
 
 ML CH2
 
-
-
 **Definition:** A radial basis function is a function whose values depend only on the distance between the input and some fixed point. 
 
 The kernel can be represented by the equation such that the position is equal to the distance between the input and center point, squared, divided by 2L^2, and then taking the negative natural exponential (e raised to the input). In this example, L is some hyperparameter that affects the rate at which the graph will go from 1 to near 0. 
diff --git a/docs/RamseyNumbers.md b/docs/RamseyNumbers.md
@@ -2,6 +2,4 @@
 
 Ch 6.2
 
-
-
 **Definition:** A Ramsey number R(m,n) where m,n are natural numbers and n is greater than or equal to 2, is the minimum number of people at a party such that there are either m mutual friends or n mutual enemies.
diff --git a/docs/RandomExperiment.md b/docs/RandomExperiment.md
@@ -2,8 +2,6 @@
 
 Ch 1.1
 
-
-
 **Definition:** A random experiment is a specified set of procedures that result in a truly random outcome (not necessarily uniformly) in the sample space.
 
 This is different than a [RandomVariables](RandomVariables.md) in the sense that a random variable maps the outcomes of a given experiment to another value whereas this outputs the outcome.
diff --git a/docs/RandomForest.md b/docs/RandomForest.md
@@ -2,8 +2,6 @@
 
 ML D4
 
-
-
 **Definition:** A random forest is an [Ensembles](Ensembles.md) of [[DecisionTrees.md]] used to make predictions based on majority voting or some other cost function.
 
 This uses a wisdom of the crowd philosophy where most likely the aggregated sum of many answers is better than one expert answer.
diff --git a/docs/RandomPatches.md b/docs/RandomPatches.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** The random patches method for random sampling uses bagging (sometimes pasting) as well as selecting a random subset of features.
 
 This ensures both a random subset of samples and a random set of features. This reduces variance but increases bias.
diff --git a/docs/RandomProjection.md b/docs/RandomProjection.md
@@ -1,7 +1,5 @@
 # Random Projection
 
-
-
 **Definition:** Random projection is an algorithm that selects dimensions at random to project onto. 
 
 Random projection is used because PCA can often be slow, and it has been shown that random projection does not loose too much data.
diff --git a/docs/RandomSubspaces.md b/docs/RandomSubspaces.md
@@ -2,6 +2,4 @@
 
 ML D5
 
-
-
 **Definition:** The random subspaces method is similar to [RandomPatches](RandomPatches.md) except it keeps all training instances and only samples features.
diff --git a/docs/RandomVariables.md b/docs/RandomVariables.md
@@ -2,8 +2,6 @@
 
 L4 + Khan
 
-
-
 **Definition:** Random variables in stats and probability are functions that map processes to outcomes that depend on random events.
 
 Random variables, despite the name, are functions not variables.
diff --git a/docs/Range.md b/docs/Range.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** The range of a function is the set of all possible outputs of the function given the domain of the function.
 
 Formally we can state it as the following where D is the domain of the function and R is the range of the input function:
diff --git a/docs/Rank.md b/docs/Rank.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** Rank, similar to [Nullity](Nullity.md), is a way to describe the dimensionallity of the vector space generated by the columns of a matrix.
 
 [Nullity](Nullity.md) is the same thing except specifically referring to a matrix's null space.
diff --git a/docs/RealVectorSpace.md b/docs/RealVectorSpace.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1
 
-
-
 **Definition:** A real vector space is a [Vector Space](VectorSpace.md) on $R$ where $R$ is the set of real numbers.
 
 ## Importance
diff --git a/docs/RecencyHeuristic.md b/docs/RecencyHeuristic.md
@@ -2,8 +2,6 @@
 
 L4
 
-
-
 **Definition:** The recency heuristic is a solution to the credit assignment problem where we assign credit to reward/punishment to the most recent state(s).
 
 This is opposed to the [Frequency Heuristic](FrequencyHeuristic.md) where we assign credit to the things that happened most often leading to the reward signal.
diff --git a/docs/RecurrenceRelation.md b/docs/RecurrenceRelation.md
@@ -2,8 +2,6 @@
 
 U2.4.2
 
-
-
 **Definition:** A recurrence relation is an equation that expresses some a_n in terms of one or more prior terms from the sequence. As such, we must specify initial conditinos such that the sequence can be calculated (think basecase).
 
 Note: The relation is an equation but the result and necessary information to find the next value is a sequence.
diff --git a/docs/ReducedRowEchelonForm.md b/docs/ReducedRowEchelonForm.md
@@ -2,9 +2,6 @@
 
 Khan
 
-
-
 **Definition:** Reduced row echelon form is a form of matrix where each row has a 1 after the zeoes that are all on the left side of the one. Additionally, each row above another row must have its 1 further to the left than the prior one, and all values to the right of the one should be zeroes if possible.
 
-
 If we are trying to find the basis of the column space then the columns with the pivot variables are that information. Alternatively, we can do RREF, write the equations, solve for the pivots based on the free variables. These will both give the correct result, but they will be different results as there are many statements of the basis of a span.
diff --git a/docs/Reflexive.md b/docs/Reflexive.md
@@ -2,6 +2,4 @@
 
 Ch 9.1
 
-
-
 **Definition:** A reflexive relation is a relation that is always true for an ordered pair where both elements are the same.
diff --git a/docs/ReflexiveClosure.md b/docs/ReflexiveClosure.md
@@ -2,8 +2,6 @@
 
 Ch 9.4
 
-
-
 **Definition:** A reflexive closure is a closure of a relation with respect to some property such that xRx for all x=x.
 
 When shown as a zero one matrix, this will manifest as the main diagonal being all 1's.
diff --git a/docs/RegressionProblem.md b/docs/RegressionProblem.md
@@ -2,8 +2,6 @@
 
 ML L1
 
-
-
 **Definition:** A regression problem is a problem where the value trying to be predicted is continuous (think graphing not yes/no).
 
 Yes/no problem is a [ClassificationProblem](ClassificationProblem.md)
diff --git a/docs/RegressionToTheMean.md b/docs/RegressionToTheMean.md
@@ -2,6 +2,4 @@
 
 L19
 
-
-
 **Definition:** Regression to the mean is the idea that if an unlikely event occurs it is likely the next sampling will be closer to the mean of the distribution.
diff --git a/docs/Relation.md b/docs/Relation.md
@@ -2,8 +2,6 @@
 
 CH 9.1
 
-
-
 **Definition:** A relation, in math, is a way to describe a connection between elements in the codomain and domain.
 
 Ex:
diff --git a/docs/RelationOnASet.md b/docs/RelationOnASet.md
@@ -2,8 +2,6 @@
 
 Ch 9.1
 
-
-
 **Definition:** A relation on a set is a relation where the domain and the codomain are the same set.
 
 Ex:
diff --git a/docs/RelativeFrequency.md b/docs/RelativeFrequency.md
@@ -2,8 +2,6 @@
 
 Ch 1.1
 
-
-
 **Definition:** Relative frequency is the value f/n where f is the [Frequency](Frequency.md) of an event under a [[RandomExperiment.md]].
 
 Note this is not the same as [Probability](Probability.md) because probability is the true likelihood whereas relative frequency has been the historical observed likelihood based on the experiment. This value does however tend towards the probability. See [[LawOfLargeNumbers.md]].
diff --git a/docs/RelativelyPrime.md b/docs/RelativelyPrime.md
@@ -2,6 +2,4 @@
 
 U 2.4
 
-
-
 **Definition:** Relatively prime numbers are prime numbers such that gcd(a,b) = 1.
diff --git a/docs/Representative.md b/docs/Representative.md
@@ -2,8 +2,6 @@
 
 Ch 9.5
 
-
-
 **Definition:** A representative is any element of an equivalence class chosen to describe the class.
 
 Often we use the least positive residual for this (think in the case of mod equivalence classes).
diff --git a/docs/Return.md b/docs/Return.md
@@ -2,6 +2,4 @@
 
 L2
 
-
-
 **Definition:** Return is the sum of future rewards taking into account discount factor.
diff --git a/docs/RewardSignal.md b/docs/RewardSignal.md
@@ -2,8 +2,6 @@
 
 RL Ch 1
 
-
-
 **Definition:** The reward signal is a one time signal sent to an agent telling them that the something right now is good.
 
 In this context right now may imply the current state is good or the next state will be good based on the action currently chosen.
diff --git a/docs/RidgeRegression.md b/docs/RidgeRegression.md
@@ -2,8 +2,6 @@
 
 ML D3
 
-
-
 **Definition:** Ridge regression uses a different cost function than standard linear regression to limit the size of coefficients.
 
 There is a regularization portion to the cost function which increases loss when coefficients are large thus incentivizing smaller coefficient values. Along with this, there is a hyperparameter, lambda, that gives more or less weight to this portion of the equation so a value of 0 would be standard linear regression while a high number would move the coeficcients closer and closer to 0.
diff --git a/docs/RightHandRule.md b/docs/RightHandRule.md
@@ -2,8 +2,6 @@
 
 3B1B
 
-
-
 **Definition:** The right hand rule describes the relation between the axis components in R^3.
 
 When the right hand rule is true we have i being the index figer, j being the middle, and k being the thumb. These correspond with <1,0,0>, <0,1,0>, and <0,0,1> respectively.
diff --git a/docs/Rotate.md b/docs/Rotate.md
@@ -2,8 +2,6 @@
 
 CS331 W12 L2
 
-
-
 Rotate is a function of the Transform class that allows rotation relative to the local rotation.
 
 See [Translate](Translate.md) for a similar function but for position. 
diff --git a/docs/Rotation.md b/docs/Rotation.md
@@ -2,8 +2,6 @@
 
 Khan U2
 
-
-
 **Definition:** A rotation is a linear transformation (assuming the rotation axis passes through the zero vector) that rotates about some axis theta degrees **counter clockwise**.
 
 ## Create Matrix
diff --git a/docs/RowBuffer.md b/docs/RowBuffer.md
@@ -1,7 +1,5 @@
 # Row Buffer
 
-
-
 **Definition:** The row buffer is the buffer used to cache a row that is from [DRAM](DRAM.md) This is used because it is 2-3 times more efficient to query a buffered memory address than it is to query for a new row in memory. This is handled by the DRAM memory controller. 
 
 Precharging is where the memory controller replaces the current buffered row with a new one that was requested this is done by sending highvoltage to the new and low voltage to the old. When these conflicts occur, this is 2-3 times slower than if the row was already cached.
diff --git a/docs/RowEchelonForm.md b/docs/RowEchelonForm.md
@@ -2,8 +2,6 @@
 
 Ch 2.2
 
-
-
 **Definition:** Row echelon form is a form such that all rows have more than or the same number of 0's starting from the left side as the row above them.
 
 In row echelon form there is no reduction of basic variables thus they don't need to be 1 like with RREF.
diff --git a/docs/RuleLearning.md b/docs/RuleLearning.md
@@ -2,9 +2,6 @@
 
 ML CH1
 
-
-
-
 **Definition:** Rule learning is the process of taking in lots of data and finding associations between data. 
 
 This information can be useful when trying to implement [DimensionalityReduction](DimensionalityReduction.md)
diff --git a/docs/RuleOfSarrus.md b/docs/RuleOfSarrus.md
@@ -2,8 +2,6 @@
 
 Khan U2
 
-
-
 **Definition:** The rule of Sarrus is a shortcut for finding the determinant of a 3x3 matrix.
 
 Formula:
@@ -22,7 +20,6 @@ Ex:
 A = [2 1 3]
     [3 4 8]
 
-
 Det(A) = 1x1x8 + 2x3x3 + 4x2x4 - 1x3x4 - 2x2x8 - 4x1x3
 
 = 8 + 18 + 32 - 12 - 32 - 12
diff --git a/docs/Rvalue.md b/docs/Rvalue.md
@@ -2,8 +2,6 @@
 
 cs202 W14 L16
 
-
-
 **Definition:** An rvalue is a temporary value that can be moved. 
 
 These values can't be on the left side of an assignment think 
diff --git a/docs/SMOTE.md b/docs/SMOTE.md
@@ -2,8 +2,6 @@
 
 ML P775
 
-
-
 **Definition:** SMOTE is the process of manipulating minority samples in the dataset to increase their representation and improve a model's classification of them.
 
 If you only have a few images of a specific type of flower maybe you augment them (rotate or something) to get 100s of instances so the model is trained on more instances of it and will subsequently be better at classifying them.
diff --git a/docs/SVM.md b/docs/SVM.md
@@ -2,8 +2,6 @@
 
 ML D3
 
-
-
 **Definition:** Support vector machines are models that create lines to separate different outputs by drawing lines between them leaving as much space possible between the different classes. They also have edges to the "street" where there is a line up the middle and these edges are only affected by instances located on the edge of the street and not by instances far off. These are the support vectors.
 
 ### Classification
@@ -22,4 +20,3 @@ A trick related to SVMs is called the polynomial kernel (kernel trick). This all
 
 When trying to use SVMs for regression we try to fit as many samples on the street while still limiting margin violations. The width of the street is controlled by the hyperparameter epsilon.
 
-
diff --git a/docs/SampleSpace.md b/docs/SampleSpace.md
@@ -2,8 +2,6 @@
 
 L1
 
-
-
 **Definition:** The sample space is the space of all possible outcomes of a random experiment.
 
 This should be two things
diff --git a/docs/Satisfiable.md b/docs/Satisfiable.md
@@ -2,8 +2,6 @@
 
 1.3.5
 
-
-
 **Definition:** A proposition is satisfiable if there is some assignment of truth values to its variables such that the outcome is true.
 
 We refer to this set of true variables as a 'solution'.
diff --git a/docs/Script.md b/docs/Script.md
@@ -2,7 +2,5 @@
 
 CS 331 W12 L3
 
-
-
 **Definition:** Scripts are where custom code can be added to accompany gameobjects they are associated with. 
 
diff --git a/docs/Segmentation.md b/docs/Segmentation.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** Segmentation in machine learning is the process of breaking up a large group into smaller ones.
 
 Image segmentation is partitioning an image into multiple segments. There are a few different types:
diff --git a/docs/SelfSupervisedLearning.md b/docs/SelfSupervisedLearning.md
@@ -2,8 +2,6 @@
 
 ML CH1
 
-
-
 **Definition:** Self-supervised learning is the process of chaning input data and the model predicting the output where the output is known to it. 
 
 This is similar to [SemiSupervisedLearning](SemiSupervisedLearning.md) where models are trained to detect certain information (clustering) without knowing what the information means.  
diff --git a/docs/SemiSupervisedLearning.md b/docs/SemiSupervisedLearning.md
@@ -2,8 +2,6 @@
 
 ML CH1
 
-
-
 **Definition:** This is training a model with some labeled and some unlabeled data. 
 
 A good example of this is google's image classification. It will tell you person 2 is in photos 8, 10, and 15 you then given one label and it is able to cluster all of this data by the label you have given.
diff --git a/docs/SentinelValue.md b/docs/SentinelValue.md
@@ -2,8 +2,6 @@
 
 CS202 (personal learning)
 
-
-
 **Definition:** A sentinel value is a constant value used to end an execution loop. 
 
 This is also referred to as a flag value, trip value, rogue value, signal value, or dummy data.
diff --git a/docs/Sequence.md b/docs/Sequence.md
@@ -2,8 +2,6 @@
 
 U2.4.1
 
-
-
 **Definition:** Sequences are ordered lists mapped to by the integers.
 
 To define a sequence we can use the following notation where n is some arbitrary element:
diff --git a/docs/Set.md b/docs/Set.md
@@ -2,8 +2,6 @@
 
 U 2.1.1
 
-
-
 **Definition:** A set is an unordered list of elements.
 
 Common Sets:
diff --git a/docs/SetFunction.md b/docs/SetFunction.md
@@ -2,8 +2,6 @@
 
 Stats CH1
 
-
-
 **Definition:** A set function is a function defined as u : X -> Y where X is a collection of sets and Y is anything. 
 
 Basically, a set function takes in a collection of sets (set of sets) and outputs a something that may be an element or set or whatever. In the context of stats it is often that the mu (greek u) takes in a subset of sets and outputs a probability of each set. 
diff --git a/docs/SharedPointers.md b/docs/SharedPointers.md
@@ -4,8 +4,6 @@
 
 **Chapter:** N/A
 
-
-
 **Definition:** A shared pointer is a pointer that keeps a reference counter so when the final reference to it goes out of scope, the memory will be freed.
 
 The value of this is unlike unique_ptr, copies of the pointer can be made. The drawback of this is the overhead both in memory and computation associated with keeping track of the number of pointers that point to the object.
@@ -30,7 +28,6 @@ class arr{
 		}
 };
 
-
 std::shared_ptr<arr> getSharedPtr(){
 	
 	auto sharedPtr = std::make_shared<arr>();
diff --git a/docs/Shear.md b/docs/Shear.md
@@ -2,8 +2,6 @@
 
 3B1B
 
-
-
 **Definition:** A shear is a type of linear transformation where one axis is 'slid' while the other reamins the same. 
 
 The following is the form of a shear in R^2 where the x values are scaled and the y value stays the same:
diff --git a/docs/SignedExtension.md b/docs/SignedExtension.md
@@ -2,11 +2,8 @@
 
 W1
 
-
-
 **Definition:** Signed extension is used to extend the size of a signed value.
 
-
 -3:
 
 101 -> 11111101
diff --git a/docs/SimilarityFeature.md b/docs/SimilarityFeature.md
@@ -2,8 +2,6 @@
 
 ML 4
 
-
-
 **Definition:** A similarity feature is an added feature that describes how similar some feature is to a particular landmark. This value generally ranges from 1 being the same to nearly or exactly 0 (depending on RBF used) being entirely different.
 
 With housing data, as an example, we may use an RBF to add another feature based on lat and long to see how far away points are from some landmark city. 
diff --git a/docs/SimpsonsParadox.md b/docs/SimpsonsParadox.md
@@ -2,8 +2,6 @@
 
 Ch 1.1
 
-
-
 **Definition:** Simpson's paradox is the seeming paradox that some outcome can be overall more common despite all individual cases making it seem less likely.
 
 Consider the case of some batters and batting averages shown below:
diff --git a/docs/SingleKey.md b/docs/SingleKey.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1.6
 
-
-
 **Definition:** Single key cryptography is data transformation that uses only a singular key in the transformation process.
 
 ### Types
diff --git a/docs/SinglyLinkedList.md b/docs/SinglyLinkedList.md
@@ -2,8 +2,6 @@
 
 CS 221 W11 Lecture 13. 
 
-
-
 **Definition:** Singly linked lists are lists that only contain pointers to the next item in the list. This is in contrast with [DoublyLinkedList](DoublyLinkedList.md) which have a pointer forward and backward.
 
 There is a pointer that needs to point to the head and then finding every subsequent element is as simple as iterating through the list. The final item in the list contains a null pointer. 
@@ -22,7 +20,6 @@ Removing first element (this is more complex in c/c++ because of memory manageme
 2. Point the head pointer to the next start
 3. Deallocate the node that was removed in step 1.
 
-
 Insert into arbitrary position: 
 
 1. Walk the list until reaching the point - 1 (done so insertion for position 3 is at 3 not 4) 
diff --git a/docs/SkeletalAnimation.md b/docs/SkeletalAnimation.md
@@ -2,8 +2,6 @@
 
 CG W14 L2
 
-
-
 **Definition:** The animation of bones.
 
 Bones are the most primitive components of 3d object rendering and they have a tip and root to denote directionallity. The body is the area between the tip and root.
diff --git a/docs/SmallestCounterExample.md b/docs/SmallestCounterExample.md
@@ -2,8 +2,6 @@
 
 Abstract Math 10.3. This is similar to [Induction](Induction.md) and [[StrongInduction.md]]
 
-
-
 **Definition:** Assume that the first element of a series is true and that not all other elements of the series are also true. We find the first element that is untrue denoted as $S_k$ and show that $S_{k-1}$ being true and $S_k$ being untrue is contradictory.
 
 **Steps:**
@@ -12,4 +10,3 @@ Abstract Math 10.3. This is similar to [Induction](Induction.md) and [[StrongInd
 3. Let k > 1 be the first instance where $S_k$ is false
 4. Show that $S_{k-1}$ being true and $S_k$ being false are contradictory
 
-
diff --git a/docs/Span.md b/docs/Span.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 2
 
-
-
 **Definition:** The span of (v_1, ..., v_m) is the set of all [Linear Combination](LinearCombination.md) of (v_1, ..., v_m).
 
 This may be all R^2, R^3, or some other vector space.
diff --git a/docs/Stack.md b/docs/Stack.md
@@ -2,8 +2,6 @@
 
 CS202 L14 / CS303 Ch 1
 
-
-
 **Definition:** This is a data structure that uses the lifo approach where you add to the top and remove from the top of the struct.
 
 push: add to stack
diff --git a/docs/Stacking.md b/docs/Stacking.md
@@ -2,8 +2,6 @@
 
 ML D5
 
-
-
 **Definition:** Stacking is the idea that we should create a dedicated model to act as a voting machine for an ensemble of predictive models.  
 
 This is in contrast with soft and hard voting which does simple calculations to determine the output based on inputs from the outputs of predictors (I know lots of words).
diff --git a/docs/StandardDeviation.md b/docs/StandardDeviation.md
@@ -2,8 +2,6 @@
 
 Stats D2
 
-
-
 **Definition:** This is the average difference between each value in a dataset and the mean of the dataset. 
 
 See also [Variance](Variance.md) which is the squared value. As such, to find the standard deviation of some random variable X we can do the following:
diff --git a/docs/StandardMatrix.md b/docs/StandardMatrix.md
@@ -2,6 +2,4 @@
 
 Khan U2
 
-
-
 **Definition:** The standard matrix of a linear transformation is the matrix we multiply the input of the function by to obtain the mapping of the input.
diff --git a/docs/Standardization.md b/docs/Standardization.md
@@ -2,8 +2,6 @@
 
 ML CH2
 
-
-
 **Definition:** Standardization is the process of scaling values such that the value is equivalent to itself subtracing the mean and dividing by the standard deviation. 
 
 This is optimal in some cases as [MinMaxScaling](MinMaxScaling.md) has issues with outliers. If there is one outlier that is much bigger than all other values the max will be very large thus squishing the range of most values to be low numbers which can effect the accuracy of models.
diff --git a/docs/StateAnalysis.md b/docs/StateAnalysis.md
@@ -2,8 +2,6 @@
 
 Ch 3
 
-
-
 **Definition:** State analysis, in the context of algorithms, is a strategy for computing the time complexity of an algorithm that analyzes the current state of the algorithm instead of describing each line of code and their associated complexity which becomes unruly as algorithms become more complex.
 
 ### Steps
diff --git a/docs/StatisticalInference.md b/docs/StatisticalInference.md
@@ -2,6 +2,4 @@
 
 Ch 1.1
 
-
-
 **Definition:** Statistical inference is the process of using statistical findings to make predictions about future events (emphasis on future).
diff --git a/docs/StemAndLeafPlot.md b/docs/StemAndLeafPlot.md
@@ -2,6 +2,4 @@
 
 Khan
 
-
-
 **Definition:** In a stem and leaf plot we have the left side where there is a stem and the right side where there is a leaf. The stem is the base value, as an example 1 and the right is a list of instances where the variable is some value in the range as an example 9. This element would mean there was some instance with a value of 19. 
diff --git a/docs/StirlingsFormula.md b/docs/StirlingsFormula.md
@@ -2,6 +2,4 @@
 
 Ch 3
 
-
-
 **Definition:** Stirling's formula is a closed form approximation for factorials. 
diff --git a/docs/StochasticAlgorithm.md b/docs/StochasticAlgorithm.md
@@ -2,8 +2,6 @@
 
 ML CH2
 
-
-
 **Definition:** A stochastic algorithm is an optimization algorithm that uses randomness. 
 
 One example of this is [KMeans](KMeans.md) which picks random cluster centroids.
diff --git a/docs/StratifiedSampling.md b/docs/StratifiedSampling.md
@@ -2,8 +2,6 @@
 
 ML CH2
 
-
-
 **Definition:** Stratified sampling is the process of selecting samples based on the likelihood of samples being from strata.
 
 This is often used when there are smaller sample sizes that can't guarantee an accurate representative sample for testing and training data. We then define some strata and try to ensure accurate representation from each grouping to get more generalizable data.
diff --git a/docs/String.md b/docs/String.md
@@ -2,8 +2,6 @@
 
 W2
 
-
-
 **Definition:** A string is a collection of ordered characters.
 
 C style strings are strings that contain n+1 indeces where the n+1th byte is all zeroes. 
diff --git a/docs/StrongAI.md b/docs/StrongAI.md
@@ -2,6 +2,4 @@
 
 Superintelligence - Bostrom
 
-
-
 **Definition:** Strong AI is an AI system that has very broad intelligence.
diff --git a/docs/StrongInduction.md b/docs/StrongInduction.md
@@ -4,10 +4,8 @@ Abstract Math 10.2. Weak induction is the normal form of induction discussed in 
 
 **Definition:** Strong induction is the process of proving one or more prior true statements implies a later one much like weak induction, but with strong induction we can prove in the form of $S_{k-5} \implies S_{k+1}$ so long as k-5 is in the domain and that every value between k-5 and k+1 has been shown to be true. 
 
-
 Steps:
 
-
 1. Prove the first statement $S_1$ or more if needed. 
 2. Given any integer k$\geq$ 1, prove $(S_1 \wedge S_2 \wedge S_3 \wedge ... \wedge S_k) \implies S_{k+1}$
 
diff --git a/docs/Subgraph.md b/docs/Subgraph.md
@@ -2,6 +2,4 @@
 
 Ch 4
 
-
-
 **Definition:** A subgraph of G(V,E) is a graph H(W,F) such that W is in V and F is in E.
diff --git a/docs/Subsequence.md b/docs/Subsequence.md
@@ -2,6 +2,4 @@
 
 Ch 6.2
 
-
-
 **Definition:** A subsequence is a selection, or all, elements of a sequence kept in order.
diff --git a/docs/Subset.md b/docs/Subset.md
@@ -2,8 +2,6 @@
 
 U 2.1.2
 
-
-
 **Definition:** The set A is a subset of B which means all elements of A are in B.
 
 A **proper** subset is a subset where the two sets are not equivalent ($A \neq B$). This is described using $\sub$ instead of $\subseteq$ which included non-proper subsets.
diff --git a/docs/Subspace.md b/docs/Subspace.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1
 
-
-
 ### Linear Algebra Context
 
 **Definition:** A subspace is a subset of a vector space.
diff --git a/docs/SubtractionRule.md b/docs/SubtractionRule.md
@@ -2,6 +2,4 @@
 
 Ch 6.1
 
-
-
 **Definition:** The subtraction rule (inclusion-exclusion principle) is the idea that the cardinality of the union of two sets is the individual cardinalities minus the elements in both sets (ensure not double counting).
diff --git a/docs/SumOfGeometricSeries.md b/docs/SumOfGeometricSeries.md
@@ -2,8 +2,6 @@
 
 Ch 6.1
 
-
-
 **Definition:** The sum of the geometric series is the formula to solve a sequence of the form ab^0 + ab^1 .... ab^n.
 
 The formula is as follows:
diff --git a/docs/SumOfVectorSpaces.md b/docs/SumOfVectorSpaces.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1
 
-
-
 **Definition:** The sum of two vector spaces is another vector space which is formed by all sums of vectors in both spaces (think combining each vector with every other vector).
 
 Note that the sum of vector spaces is not simply limited to two vector spaces and can be stated as follows for 3 vector spaces where V_1, V_2, V_3 are vector spaces as is S_1:
diff --git a/docs/SumRule.md b/docs/SumRule.md
@@ -2,8 +2,6 @@
 
 Ch 6.1
 
-
-
 **Definition:** The sum rule states that the total number of possible choices is the sum of all choices.
 
 Example:
diff --git a/docs/SuperScalar.md b/docs/SuperScalar.md
@@ -2,6 +2,4 @@
 
 Computer Architecture L2
 
-
-
 **Definition:** Execute multiple instructions per cycle.
diff --git a/docs/SupervisedLearning.md b/docs/SupervisedLearning.md
@@ -2,8 +2,6 @@
 
 ML L1
 
-
-
 **Definition:** Training a model by giving it inputs and valid associated outputs.
 
 Most widely used form of model training.
diff --git a/docs/SupportVectorMachine.md b/docs/SupportVectorMachine.md
@@ -2,6 +2,4 @@
 
 ML L1
 
-
-
 **Definition:** Algorithm that allows for an infinite dimensional vector as an input.
diff --git a/docs/SurfaceRepresentation.md b/docs/SurfaceRepresentation.md
@@ -2,8 +2,6 @@
 
 CS 331 W11 L2
 
-
-
 **Definition:** Modelling the surface of a continuous object in a discrete computing environment.
 
 To do this we use a [Mesh](Mesh.md) 
diff --git a/docs/Surjective.md b/docs/Surjective.md
@@ -2,8 +2,6 @@
 
 L2
 
-
-
 **Definition:** For a function to be surjective each value in the codomain must be mapped to at least once.
 
 Also known as **"onto"**
diff --git a/docs/SymmetricClosure.md b/docs/SymmetricClosure.md
@@ -2,6 +2,4 @@
 
 Ch 9.4
 
-
-
 **Definition:** A symmetric closure is the closure of some relation under some property such that if xRy then yRx.
diff --git a/docs/SymmetricMatrix.md b/docs/SymmetricMatrix.md
@@ -2,8 +2,6 @@
 
 Ch 2.2
 
-
-
 **Definition:** A symmetric matrix is a matrix whereby A = A^T. 
 
 When viewing a symmetric matrix we see that all values are mirrored across the diagonal that goes from top left to the bottom right of the matrix.
diff --git a/docs/SystemsOfEquations.md b/docs/SystemsOfEquations.md
@@ -2,6 +2,4 @@
 
 Khan
 
-
-
 **Defintition:** Systems of equations are sets of equations that are to be solved together. 
diff --git a/docs/TF-IDF.md b/docs/TF-IDF.md
@@ -63,7 +63,6 @@ def tf(prefix, filename, word):
         return count_of_word / len(words)
     return 0 # empty documents
 
-
 # technically, we might just want the one word and output the value for that
 def idf(prefix, filenames):
     word_document_frequency = {}
diff --git a/docs/TargetEncoding.md b/docs/TargetEncoding.md
@@ -14,7 +14,6 @@ Equation:
 
 $\frac{n* \text{option mean} + m* \text{overall mean}}{n+m}$
 
-
 ## Issues
 
 The main issue with this approach is overfitting. When setting a parameter based on the target there is a higher likelihood that you will overfit the training data. 
diff --git a/docs/Task.md b/docs/Task.md
@@ -2,8 +2,6 @@
 
 Ch 0
 
-
-
 **Definition:** A task is a function from I to O where I is the set of all valid inputs and O is the set of all valid outputs.
 
 In this definition we are using the formal definition of a function from math.
diff --git a/docs/Tautology.md b/docs/Tautology.md
@@ -2,8 +2,6 @@
 
 1.3.1
 
-
-
 **Definition:** A tautology is a statement that is always true.
 
 An example of a tautology is $p \vee \neg p$.
diff --git a/docs/Tensor.md b/docs/Tensor.md
@@ -2,8 +2,6 @@
 
 ML P626
 
-
-
 **Definition:** A tensor is a multidimensional array of any dimensionallity. 
 
 Tensors can be 0-dim (scalars), 1-dim (vectors), 2-dim (matrix), and higher dimensions as well.
diff --git a/docs/Texture.md b/docs/Texture.md
@@ -2,8 +2,6 @@
 
 CS 331 W11 Lecture 2
 
-
-
 **Definition:** The texture of an object is it's surface and how it looks.
 
 This is implemented in unity via the [MeshRenderer](MeshRenderer.md)
diff --git a/docs/TextureMaps.md b/docs/TextureMaps.md
@@ -6,4 +6,3 @@ CS 331 W11 / 2
 
 **Definition:** Texture maps are used to control the look of the [Texture](Texture.md) associated with an object. Texture maps attempt to simulate real world 3d surfaces without the cost of computing many meshes. 
 
-
diff --git a/docs/TheRightTimeToLearn.md b/docs/TheRightTimeToLearn.md
@@ -4,4 +4,3 @@
 
 ## Summary
 
-
diff --git a/docs/TimeComplexity.md b/docs/TimeComplexity.md
@@ -2,6 +2,4 @@
 
 Ch 2
 
-
-
 **Definition:** Let A be an algorithm. The worst case, best case, or average case time complexity of A is the function f: N->N where f(n) is the max, min, or average number of instructions executed by the algorithm for all inputs of size n bytes.
diff --git a/docs/Tractable.md b/docs/Tractable.md
@@ -2,8 +2,6 @@
 
 U 2.3 
 
-
-
 **Definition:** A tractable problem is a problem that can be solved in polynomial time (reasonable amount of time).
 
 See also [Intractable](Intractable.md).
diff --git a/docs/TransferLearning.md b/docs/TransferLearning.md
@@ -2,8 +2,6 @@
 
 ML CH1
 
-
-
 **Definition:** Transfer learning is the process of transferring knowledge from one task to another. 
 
 An example of this is training a model to reconstruct images of pets using self-supervised learning. Using this, we can then make the model into a classification modelf based on labelled data for different types of pets.
diff --git a/docs/Transformations.md b/docs/Transformations.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** Transoformations are functions that take an input vector and output another vector.
 
 See [LinearTransformation](LinearTransformation.md) for a specific type.
diff --git a/docs/Transitive.md b/docs/Transitive.md
@@ -2,6 +2,4 @@
 
 Ch 9.1
 
-
-
 **Definition:** A transitive relation holds the transitive property namely that if xRy and yRz then xRz for all x,y,z.
diff --git a/docs/TransitiveClosure.md b/docs/TransitiveClosure.md
@@ -2,8 +2,6 @@
 
 Ch 9.4
 
-
-
 **Definition:** A transitive closure is the closure of a relation under some property such that each element where there is a path from one to another is directly connected.
 
 This can be thought of as fully connecting any connected components.
diff --git a/docs/Translate.md b/docs/Translate.md
@@ -2,8 +2,6 @@
 
 CS331 W12 L2
 
-
-
 This is a method of Unity's Transform class that moves the GameObject by the distance specified with respect to the local coordinate system. 
 
 See [Rotate](Rotate.md) for similar function for rotating based on local rotation. 
diff --git a/docs/Transpose.md b/docs/Transpose.md
@@ -22,7 +22,6 @@ A  = [7 6 4]
 A^T = [8 6 1]
 	  [0 4 6]
 
-
 B = [1 2]
 	[3 4]
 
diff --git a/docs/Tree.md b/docs/Tree.md
@@ -6,7 +6,6 @@ Abstract Math and CS202
 
 There is no implication about split numbers or anything of the sort, but something interesting is that in all cases it must be true that the number of edges is one less than the number of vertices. This can be proved through [Strong Induction](StrongInduction.md)
 
-
 **Root:** This is a node that has no parents
 
 **Parent:** The node above a given node
diff --git a/docs/TreeDiagram.md b/docs/TreeDiagram.md
@@ -2,8 +2,6 @@
 
 Ch 6.1
 
-
-
 **Definition:** A tree diagram is a diagram that shows all possible choices (outcomes) along with their branching.
 
 Think of 2^n where we split into 2 paths n times as a horizontal diagram.
diff --git a/docs/Trichotomy.md b/docs/Trichotomy.md
@@ -2,8 +2,6 @@
 
 CLRS 3.2
 
-
-
 **Definition:** Trichotomy is a property of real numbers such that for any two real numbers one of the following must be true:
 
 1. a < b
diff --git a/docs/TripleProductExpansion.md b/docs/TripleProductExpansion.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** The triple product expansion states the combined cross product of three vectors a x (b x c) = b(a dot c) - c(a dot b)
 
 This is also known as lagrange's formula. 
diff --git a/docs/TruePositiveRate.md b/docs/TruePositiveRate.md
@@ -2,8 +2,6 @@
 
 ML CH3
 
-
-
 **Definition:** This is the ratio of positive instances that are correctly classified.
 
 As such, we have the following equation:
diff --git a/docs/Trust.md b/docs/Trust.md
@@ -4,6 +4,4 @@
 
 **Chapter:** 1.8
 
-
-
 **Definition:** Trust is one's willingness to be vulnerable to the actions of another party based on the expectation the other party will perform an action important to the truster without necessarily being able to monitor or control the other party.
diff --git a/docs/TruthSet.md b/docs/TruthSet.md
@@ -2,6 +2,4 @@
 
 U 2.1.2
 
-
-
 **Definition:** The truth set of a function P(x) is the set of all elements of the domain such that P(x) is true.
diff --git a/docs/TwoKey.md b/docs/TwoKey.md
@@ -4,8 +4,6 @@
 
 **Chapter:**1.6
 
-
-
 **Definition:** Two key cryptography is data transformation where there are two different keys involved in the process.
 
 This encompasses public-private key cryptography.
diff --git a/docs/TwosComplement.md b/docs/TwosComplement.md
@@ -2,8 +2,6 @@
 
 SS
 
-
-
 **Definition:** Two's complement is an implementation of negative numbers where a leading one and flipped bits are used to represent negative numbers.
 
 To do this calculation in either direction flip all bits and add 1. 
diff --git a/docs/UVMaps.md b/docs/UVMaps.md
@@ -2,8 +2,6 @@
 
 CG W13 L1
 
-
-
 **Definition:** A UV map is a function that takes a mesh and returns an image. This describes how to "color in" the mesh.
 
 UV maps take in a mesh and return the unwrapped mesh. 
diff --git a/docs/UnaryOperations.md b/docs/UnaryOperations.md
@@ -2,8 +2,6 @@
 
 SS
 
-
-
 **Definition:** Unary operations are operations that only take one input.
 
 These operations include increment, decrement, square root, etc.
diff --git a/docs/Underfitting.md b/docs/Underfitting.md
@@ -2,8 +2,6 @@
 
 ML CH1
 
-
-
 **Definition:** Using a model that is too simple to learn the underlying structure of data.
 
 See [Overfitting](Overfitting.md) for the inverse of this.
diff --git a/docs/Undersmoothing.md b/docs/Undersmoothing.md
@@ -2,6 +2,4 @@
 
 Stats D3
 
-
-
 **Definition:** Undersmoothing is when a bandwidth value that is too small is selected for the kernel bandwidth of a kde and by doing this is overfits the dataset.
diff --git a/docs/Unicode.md b/docs/Unicode.md
@@ -2,6 +2,4 @@
 
 W2
 
-
-
 **Definition:** Unicode is a character encoding systems that uses two bytes to represent almost all characters across languages.
diff --git a/docs/UniquePointers.md b/docs/UniquePointers.md
@@ -4,8 +4,6 @@
 
 **Chapter:** N/A
 
-
-
 **Definition:** A unique pointer in c++ is a pointer that can not be copied and once out of scope, automatically deallocates associated memory.
 
 The value of this is that unlike shared pointers, it does not have any overhead beyond normal pointers. Additionally, once it goes out of scope, the memory is managed automatically. This is useful because we can then return things from a method and not have to worry about deallocation of the memory afterwards. An example of this is shown below:
diff --git a/docs/UnitVector.md b/docs/UnitVector.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** A unit vector is any vector with length of 1. 
 
 It is true that ||u|| = 1 when u is a unit vector, in all cases.
diff --git a/docs/Unity.md b/docs/Unity.md
@@ -2,8 +2,6 @@
 
 Unity is a popular game engine, no duh. 
 
-
-
 ### General Stuff
 
 **Unity Hub:** Used to manage projects and create projects. This can also be used to select IDE versions
diff --git a/docs/UniversalSet.md b/docs/UniversalSet.md
@@ -2,6 +2,4 @@
 
 L1
 
-
-
 **Definition:** The universal set either denoted by U or Omega is the set of all objects that are of interest in a particular context.
diff --git a/docs/Universe.md b/docs/Universe.md
@@ -2,8 +2,6 @@
 
 U 1.4.1
 
-
-
 **Definition:** The universe in math is the set of all objects that bear consideration. 
 
 Often we state the universe as the variable U.
diff --git a/docs/Unsolvable.md b/docs/Unsolvable.md
@@ -2,8 +2,6 @@
 
 U 2.3
 
-
-
 **Definition:** Unsolvable problems are problems that can't be solved in even exponential time.
 
 A well known example of an unsolvable problem is the halting problem.
diff --git a/docs/UnstableGradients.md b/docs/UnstableGradients.md
@@ -2,8 +2,6 @@
 
 ML 550
 
-
-
 **Definition:** Unstable gradients are the idea that different layers of a neural network can learn at widely different rates.
 
 This often manifests as [ExplodingGradients](ExplodingGradients.md) or [[VanishingGradients.md]]
diff --git a/docs/UnsupervisedPretraining.md b/docs/UnsupervisedPretraining.md
@@ -2,8 +2,6 @@
 
 ML P576
 
-
-
 **Definition:** Unsupervised pretraining is the process of pretraining a model on unlabeled data and then adding layers on top of the model using labelled data to get predictions.
 
 This is often used because unlabeled data is often abundant, but labeled data is expensive.
diff --git a/docs/VacuousProof.md b/docs/VacuousProof.md
@@ -2,6 +2,4 @@
 
 U 1.7
 
-
-
 **Definition:** A vacuous proof is for proofs of the form if p then q where we then show that p is always false thus there is no need to evaluate for q.
diff --git a/docs/ValueFunction.md b/docs/ValueFunction.md
@@ -2,8 +2,6 @@
 
 RL Ch 1
 
-
-
 **Definition:** The value function describes the overall expected reward for an agent.
 
 This includes a gamma term (discount factor) which is between 1 and 0 with 0 meaning future rewards don't mean anything and 1 meaning future rewards are equally as important as short term rewards.
diff --git a/docs/VandermondesIdentity.md b/docs/VandermondesIdentity.md
@@ -2,8 +2,6 @@
 
 Ch 6.4
 
-
-
 **Definition:** Vandermonde's identity is an identity that describes n+m choose k as a sum of all ways to select 0 of one and k of the other 1 of one and k-1 of the other and so on.
 
 $\binom{n+m}{k} = \sum^k_{i=0} \binom{n}{i} \binom{m}{k-i}$
diff --git a/docs/VanishingGradients.md b/docs/VanishingGradients.md
@@ -2,8 +2,6 @@
 
 ML 550
 
-
-
 **Definition:** Vanishing gradients is a neural network problem where lower levels (earlier hidden layers) have such small gradients that gradient steps make tiny changes and the model never converges upon an a good solution.
 
 This is a very common problem as most of the time gradients get smaller and smaller. As such, this problem is much more common than [ExplodingGradients](ExplodingGradients.md) which primarly happens with RNNs.
diff --git a/docs/Variables.md b/docs/Variables.md
@@ -2,6 +2,4 @@
 
 Khan
 
-
-
 **Definition:** Variables are characteristics that can in some way be measure, counted, or categorized.  
diff --git a/docs/VariadicOperations.md b/docs/VariadicOperations.md
@@ -2,8 +2,6 @@
 
 SS
 
-
-
 **Definition:** Variadic operations are operations that can take a varying number of inputs.
 
 Some examples of these include sum, min, and max which would type in arbitrary lenght arrays in certain languages.
diff --git a/docs/Vector.md b/docs/Vector.md
@@ -1,7 +1,5 @@
 # Vector (C++)
 
-
-
 **Definition:** Vectors in c++ are dynamically allocated arrays that use the heap instead of the stack.
 
 Vectors are generally preferred to integer arrays because they can manage their own memory, be resized, and don't have to have a known size at compile time.
@@ -47,7 +45,6 @@ int main(){
 	return 0;
 }
 
-
 ```
 
 As we can see, vectors can be returned and passed around by value (default) and when this is done side effects don't impact the original vector. If we pass by reference either with a pointer or by specifying the input parameter as std::vector<int>& (this is the preferred way), we can then make changes to the input vector while still having memory managed for us automatically, as shown in the final while(true) loop.
diff --git a/docs/Vector3.md b/docs/Vector3.md
@@ -2,8 +2,6 @@
 
 CS 331 W12 L3
 
-
-
 **Definition:** The Vector3 class in unity is used to represent x,y, and z coordinates in a singular object. This object stores each axis value as a float.
 
 See [Movement](Movement.md) for how to use Vector3s to move. 
diff --git a/docs/VectorMatrixMultipication.md b/docs/VectorMatrixMultipication.md
@@ -2,8 +2,6 @@
 
 Khan
 
-
-
 **Definition:** Vector matrix multiplication can be performed by taking the combination of the first column of the matrix with the first top row of the vector and then repeating this throughout. 
 
 As described above:
@@ -13,12 +11,10 @@ As described above:
 [2 1] [c] = [2c + 1d] = cv + dw
 [4 2] [d]	[4c + 2d]
 
-
 ```
 
 Where the vector [2,4] is v and [1,2]
 
-
 Let's do another one.
 
 ```
diff --git a/docs/VectorSpace.md b/docs/VectorSpace.md
@@ -4,8 +4,6 @@
 
 **Chapter:** 1
 
-
-
 **Definition:** A vector space is a space where we find a closure under vector addition and scalar multiplication.
 
 Along with this, the following must be true:
diff --git a/docs/Vertex.md b/docs/Vertex.md
@@ -2,6 +2,4 @@
 
 CG W13 L1
 
-
-
 **Definition:** A vertex is a point in 3d space. 
diff --git a/docs/VigenereCipher.md b/docs/VigenereCipher.md
@@ -2,6 +2,4 @@
 
 U 2.4
 
-
-
 **Definition:** Vigenere cipher is an polyalphabetic encryption scheme where we specify a key and then shift each element in the original message by the number represented by the character at the current location. When doing this we iterate through the key to ensure there is not one value doing the encrypting like with a Caesar Cipher.
diff --git a/docs/VisualizationAlgorithm.md b/docs/VisualizationAlgorithm.md
@@ -2,6 +2,4 @@
 
 ML Ch1
 
-
-
 **Definition:** Visualization algorithms are [UnsupervisedLearning](UnsupervisedLearning.md) algorithms that output 2D or 3D representations of your data. 
diff --git a/docs/VonNeumannModel.md b/docs/VonNeumannModel.md
@@ -2,8 +2,6 @@
 
 Computer Architecture L2
 
-
-
 **Definition:** Control signals are used to create a distinction between data and instructions in memory, but they are both saved together. Additionally, instructions are completed sequentially ie. finish one, fetch the next compute, etc. 
 
 This is our broad model for computing and computer architecture. Additionally, there is a single bus for memory (I would think having more could cause concurrency issues), a program control unit (control signals), and an arithmetic unit (cpu).
diff --git a/docs/VotingClassifiers.md b/docs/VotingClassifiers.md
@@ -2,8 +2,6 @@
 
 ML D4
 
-
-
 **Definition:** Voting classifiers are ensembles of classification models that use each of their outputs to predict the final output.
 
 Assume you are ussing an SVM classifier, random forest, and logistic regression, the outputs of these may be computed and then whichever classification gets the most votes is decided to be the output. 
diff --git a/docs/WeakAI.md b/docs/WeakAI.md
@@ -2,6 +2,4 @@
 
 Superintelligence - Bostrom
 
-
-
 **Definition:** Weak AI is an AI system that has very narrow intelligence (think chess bot)
diff --git a/docs/Weight.md b/docs/Weight.md
@@ -2,8 +2,6 @@
 
 ML D6
 
-
-
 **Definition:** Weights in ANNs are numerical values that represent the strength of connections between neurons and biases.
 
 The connection strengths are called kernels and the sum of these + biases is the total number of trainable parameters (weights).
diff --git a/docs/WeightedGraph.md b/docs/WeightedGraph.md
@@ -2,6 +2,4 @@
 
 Ch 4
 
-
-
 **Definition:** A weighted graph is a graph where we maintain a list of weights for edges to represent the cost of traversal.
diff --git a/docs/WellOrdered.md b/docs/WellOrdered.md
@@ -2,8 +2,6 @@
 
 Abstract Math Chapter 10
 
-
-
 **Definition:** A well order set has a definite smallest element. 
 
 This is important because it is the basis for [Induction](Induction.md) as without it, there would be no way to prove that $S_n\implies S_{n+1}$ means that for something is true for all values in the set. 
diff --git a/docs/WideAndDeepNN.md b/docs/WideAndDeepNN.md
@@ -2,8 +2,6 @@
 
 ML D6
 
-
-
 **Definition:** Wide and deep neural networks are a model architecture where some or all inputs are connected directly to outputs while also having a path through the neural network through hidden layers.
 
 By using a wide and deep neural network we don't worry about muddying simple relationships through the long path of a neural network as some values will be automatically factored into the outputs. 
diff --git a/docs/Word.md b/docs/Word.md
@@ -2,6 +2,4 @@
 
 W1
 
-
-
 **Definition:** A word is the number of bits processed by a cpu this is typically 64/32. 
diff --git a/docs/ZeroExtension.md b/docs/ZeroExtension.md
@@ -2,8 +2,6 @@
 
 W1
 
-
-
 **Definition:** Zero extension is the process of extending an unsigned integer to take up more bits but still maintain the same value.
 
 101 -> 00000101
diff --git a/docs/ZeroOneMatrix.md b/docs/ZeroOneMatrix.md
@@ -2,8 +2,6 @@
 
 Ch 9.3
 
-
-
 **Definition:** A zero one matrix is a boolean matrix where each index is either 0 or 1.
 
 When multiplying boolean matricies we can either do matrix multiplication and assume anything non-zero is one or we can do or comparisons (more inline with the philosophy of zero one matricies).
diff --git a/docs/rsync.md b/docs/rsync.md
@@ -2,8 +2,6 @@
 
 Notes on backups with rsync
 
-
-
 Rsync is the best way to backup a folder to another folder. This is especially useful when mounting another drive and then setting up a backup system to backup a folder to that drive. 
 
 Usage: rsync -av --delete /home /srv/backup
diff --git a/docs/usubstitution.md b/docs/usubstitution.md
@@ -2,6 +2,4 @@
 
 Unit 2
 
-
-
 **Definition:** U-substitution is an integration technique whereby we attempt to revers the chain rule by finding u and du in an integral, substituting, and then evaluating.

	notes Personal notes
	git clone git://git.laack.co/notes.git
	Log \| Files \| Refs

M	docs/AES.md	\|	1	-
M	docs/AISafety.md	\|	2	--
M	docs/AbstractDataType.md	\|	2	--
M	docs/Abstraction.md	\|	2	--
M	docs/Accuracy.md	\|	2	--
M	docs/ActiveAttacks.md	\|	1	-
M	docs/AdaBoost.md	\|	2	--
M	docs/AdaGrad.md	\|	2	--
M	docs/AdjacencyMatrix.md	\|	2	--
M	docs/Algorithm.md	\|	2	--
M	docs/Algorithms.md	\|	1	-
M	docs/AmbientSpace.md	\|	2	--
M	docs/Amortization.md	\|	2	--
M	docs/AngleBetweenVectors.md	\|	2	--
M	docs/Animation.md	\|	2	--
M	docs/AnimationController.md	\|	3	---
M	docs/AnomalyDetection.md	\|	2	--
M	docs/Antisymmetric.md	\|	2	--
M	docs/Arccos.md	\|	2	--
M	docs/Arcsin.md	\|	2	--
M	docs/ArithmeticComputations.md	\|	2	--
M	docs/Armature.md	\|	3	---
M	docs/Ascii.md	\|	2	--
M	docs/Asset.md	\|	3	---
M	docs/AstronomicalUnit.md	\|	2	--
M	docs/AsymptoticNotation.md	\|	3	---
M	docs/Authentication.md	\|	2	--
M	docs/Autoencoder.md	\|	2	--
M	docs/Availability.md	\|	2	--
M	docs/BCD.md	\|	3	---
M	docs/Backpropagation.md	\|	2	--
M	docs/Bagging.md	\|	2	--
M	docs/Baking.md	\|	2	--
M	docs/Bandits.md	\|	2	--
M	docs/BarrierSynchronization.md	\|	3	---
M	docs/BasisOfSubspace.md	\|	2	--
M	docs/BatchNormalization.md	\|	2	--
M	docs/BayesianInference.md	\|	2	--
M	docs/BekensteinBound.md	\|	2	--
M	docs/BellmanEquation.md	\|	2	--
M	docs/BernoulliProcess.md	\|	2	--
M	docs/BernoulliRandomVariable.md	\|	2	--
M	docs/Bias.md	\|	3	---
M	docs/Biconditional.md	\|	2	--
M	docs/BigONotation.md	\|	2	--
M	docs/BigThetaNotation.md	\|	2	--
M	docs/BijectiveProof.md	\|	3	---
M	docs/BinaryCode.md	\|	2	--
M	docs/BinaryOperations.md	\|	2	--
M	docs/BinaryTree.md	\|	2	--
M	docs/Binomial.md	\|	2	--
M	docs/BinomialCoefficient.md	\|	2	--
M	docs/BinomialDistribution.md	\|	2	--
M	docs/Bipartite.md	\|	2	--
M	docs/BitSteering.md	\|	2	--
M	docs/BlenderShortcuts.md	\|	3	---
M	docs/Boxplots.md	\|	2	--
M	docs/BreadthFirstSearch.md	\|	2	--
M	docs/BucketAddressing.md	\|	2	--
M	docs/CART.md	\|	2	--
M	docs/CNN.md	\|	2	--
M	docs/CaesarCipher.md	\|	2	--
M	docs/Calculus.md	\|	2	--
M	docs/CanaryValue.md	\|	2	--
M	docs/CartesianProduct.md	\|	2	--
M	docs/Cases.md	\|	2	--
M	docs/CategoricalCrossEntropy.md	\|	2	--
M	docs/Ceiling.md	\|	2	--
M	docs/CentralLimitTheorem.md	\|	2	--
M	docs/ChainRule.md	\|	2	--
M	docs/Chaining.md	\|	2	--
M	docs/ChangeOfBasis.md	\|	2	--
M	docs/CharacteristicEquation.md	\|	2	--
M	docs/CharacteristicRoots.md	\|	2	--
M	docs/Clip.md	\|	2	--
M	docs/Closure.md	\|	2	--
M	docs/Codeword.md	\|	2	--
M	docs/Codomain.md	\|	2	--
M	docs/Collection.md	\|	2	--
M	docs/Collision.md	\|	2	--
M	docs/ColumnSpace.md	\|	2	--
M	docs/Combination.md	\|	2	--
M	docs/CombinatorialProof.md	\|	2	--
M	docs/Combinatorics.md	\|	2	--
M	docs/Commutative.md	\|	2	--
M	docs/Complement.md	\|	2	--
M	docs/ComplexVectorSpace.md	\|	2	--
M	docs/CompositeNumber.md	\|	2	--
M	docs/ComputationalGraph.md	\|	2	--
M	docs/ConditionalProbability.md	\|	2	--
M	docs/Confidentiality.md	\|	2	--
M	docs/ConfusionMatrix.md	\|	2	--
M	docs/Congruence.md	\|	2	--
M	docs/CongruenceClass.md	\|	2	--
M	docs/Connected.md	\|	2	--
M	docs/ConnectedComponent.md	\|	2	--
M	docs/Connectives.md	\|	2	--
M	docs/Contingency.md	\|	2	--
M	docs/ContinuousProbability.md	\|	2	--
M	docs/Contradiction.md	\|	2	--
M	docs/Contrapositive.md	\|	2	--
M	docs/Converse.md	\|	2	--
M	docs/Coordinate.md	\|	2	--
M	docs/Correlation.md	\|	2	--
M	docs/CorrelationCoefficient.md	\|	2	--
M	docs/CosineSimilarity.md	\|	1	-
M	docs/CountSort.md	\|	2	--
M	docs/CounterExample.md	\|	2	--
M	docs/CountingPrinciple.md	\|	2	--
M	docs/Covariance.md	\|	2	--
M	docs/CramersRule.md	\|	2	--
M	docs/CreditAssignmentProblem.md	\|	2	--
M	docs/CrossProduct.md	\|	2	--
M	docs/CrossValidation.md	\|	2	--
M	docs/Crosstabulation.md	\|	3	---
M	docs/CumulativeDensityFunction.md	\|	2	--
M	docs/Cycle.md	\|	2	--
M	docs/DBSCAN.md	\|	2	--
M	docs/DRAM.md	\|	3	---
M	docs/DRAMBanks.md	\|	2	--
M	docs/DRAMChips.md	\|	2	--
M	docs/DRAMRefresh.md	\|	2	--
M	docs/DRAMRowHammer.md	\|	2	--
M	docs/DataAugmentation.md	\|	2	--
M	docs/DataStructureAugmentation.md	\|	2	--
M	docs/DecisionThreshold.md	\|	2	--
M	docs/DecisionTrees.md	\|	6	------
M	docs/DemorgansLaw.md	\|	2	--
M	docs/DensityEstimation.md	\|	2	--
M	docs/DerivedDistribution.md	\|	2	--
M	docs/DesignPoint.md	\|	2	--
M	docs/Determinant.md	\|	3	---
M	docs/DiagonalMatrices.md	\|	2	--
M	docs/Digraph.md	\|	2	--
M	docs/DimensionalityReduction.md	\|	2	--
M	docs/Dimensions.md	\|	2	--
M	docs/DirectProof.md	\|	2	--
M	docs/DirectSum.md	\|	2	--
M	docs/DiscountFactor.md	\|	2	--
M	docs/DiscreteProbability.md	\|	2	--
M	docs/DiscreteRandomVariable.md	\|	2	--
M	docs/DiscreteUniformLaw.md	\|	2	--
M	docs/DisjointSet.md	\|	2	--
M	docs/DistanceCalculation.md	\|	2	--
M	docs/DistanceToPlane.md	\|	2	--
M	docs/Distinguishable.md	\|	2	--
M	docs/DistinguishablePermutation.md	\|	2	--
M	docs/Distributive.md	\|	2	--
M	docs/DistributiveLaw.md	\|	2	--
M	docs/Div.md	\|	4	----
M	docs/DivideAndConquer.md	\|	2	--
M	docs/DivisionAlgorithm.md	\|	3	---
M	docs/DivisionRule.md	\|	2	--
M	docs/DivisionRules.md	\|	2	--
M	docs/DotProduct.md	\|	3	---
M	docs/DoublyLinkedList.md	\|	2	--
M	docs/Dropout.md	\|	2	--
M	docs/Duality.md	\|	2	--
M	docs/DynamicProgramming.md	\|	2	--
M	docs/EarlyStopping.md	\|	2	--
M	docs/EigenVector.md	\|	2	--
M	docs/ElasticNetRegression.md	\|	2	--
M	docs/ElementaryTransformations.md	\|	2	--
M	docs/EligibilityTraces.md	\|	2	--
M	docs/EmptyGraph.md	\|	2	--
M	docs/Ensembles.md	\|	2	--
M	docs/Entropy.md	\|	2	--
M	docs/Episode.md	\|	2	--
M	docs/Episodic.md	\|	2	--
M	docs/EuclideanAlgorithm.md	\|	2	--
M	docs/EulersTheorem.md	\|	2	--
M	docs/Evaluation.md	\|	2	--
M	docs/Event.md	\|	2	--
M	docs/EvolutionaryMethods.md	\|	2	--
M	docs/ExhaustiveProof.md	\|	2	--
M	docs/Expectation.md	\|	2	--
M	docs/ExplodingGradients.md	\|	2	--
M	docs/Exploit.md	\|	2	--
M	docs/ExploratoryDataAnalysis.md	\|	2	--
M	docs/Explore.md	\|	2	--
M	docs/ExtraTrees.md	\|	2	--
M	docs/FactorsOfVariation.md	\|	2	--
M	docs/Feature.md	\|	2	--
M	docs/FeatureScaling.md	\|	2	--
M	docs/FermatsTheorem.md	\|	2	--
M	docs/FibonacciNumbers.md	\|	3	---
M	docs/FiniteDimensional.md	\|	2	--
M	docs/Floor.md	\|	2	--
M	docs/Folding.md	\|	2	--
M	docs/FreeVariables.md	\|	2	--
M	docs/Frequency.md	\|	2	--
M	docs/FrequencyHeuristic.md	\|	2	--
M	docs/FrobeniusNorm.md	\|	2	--
M	docs/FunctionCompositionOperator.md	\|	1	-
M	docs/FunctionNotation.md	\|	2	--
M	docs/FundamentalOperations.md	\|	2	--
M	docs/FundamentalTheoremOfArithmetic.md	\|	1	-
M	docs/FundamentalTheroemofCalculus.md	\|	2	--
M	docs/GCD.md	\|	2	--
M	docs/GameLoop.md	\|	2	--
M	docs/GameObject.md	\|	2	--
M	docs/GaussianElimination.md	\|	2	--
M	docs/GaussianIntegers.md	\|	2	--
M	docs/GaussianMixtureModels.md	\|	2	--
M	docs/GeneralSolution.md	\|	2	--
M	docs/GeneralizationError.md	\|	2	--
M	docs/GeneralizedPigeonholePrinciple.md	\|	2	--
M	docs/GradientBoosting.md	\|	2	--
M	docs/GradientClipping.md	\|	2	--
M	docs/GradientDescent.md	\|	2	--
M	docs/GradientDescentCode.md	\|	6	------
M	docs/GramSchmidtProcess.md	\|	2	--
M	docs/HadamardProduct.md	\|	2	--
M	docs/HalfWord.md	\|	2	--
M	docs/HarmonicMean.md	\|	4	----
M	docs/HasseDiagram.md	\|	2	--
M	docs/HistogramBasedGradientBoosting.md	\|	2	--
M	docs/HistoricalDesigns.md	\|	2	--
M	docs/Hyperparameter.md	\|	2	--
M	docs/Hypervolume.md	\|	2	--
M	docs/IPD.md	\|	2	--
M	docs/IQR.md	\|	2	--
M	docs/ISA.md	\|	2	--
M	docs/IdentityMatrix.md	\|	2	--
M	docs/Image.md	\|	2	--
M	docs/ImitationLearning.md	\|	2	--
M	docs/Imputation.md	\|	2	--
M	docs/IncrementalMean.md	\|	3	---
M	docs/Independence.md	\|	2	--
M	docs/IndependentEvents.md	\|	2	--
M	docs/Indistinguishable.md	\|	2	--
M	docs/Individuals.md	\|	2	--
M	docs/Induction.md	\|	2	--
M	docs/Inertia.md	\|	2	--
M	docs/Inference.md	\|	2	--
M	docs/InformationContent.md	\|	2	--
M	docs/InformationSecurity.md	\|	2	--
M	docs/Injective.md	\|	2	--
M	docs/Input.md	\|	2	--
M	docs/InsertionSort.md	\|	3	---
M	docs/InstanceBasedLearning.md	\|	2	--
M	docs/Instruction.md	\|	3	---
M	docs/IntegerOverflow.md	\|	2	--
M	docs/Integrity.md	\|	2	--
M	docs/IntelligenceExplosion.md	\|	2	--
M	docs/Intractable.md	\|	2	--
M	docs/Invariance.md	\|	2	--
M	docs/Inverse.md	\|	2	--
M	docs/InverseFunction.md	\|	2	--
M	docs/InverseMatrix.md	\|	2	--
M	docs/InverseTransformation.md	\|	2	--
M	docs/Invertible.md	\|	3	---
M	docs/IteratedExpectations.md	\|	2	--
M	docs/Jerk.md	\|	2	--
M	docs/JointDensityFunction.md	\|	2	--
M	docs/JointProbability.md	\|	2	--
M	docs/KNearestNeighbor.md	\|	2	--
M	docs/Kernel.md	\|	2	--
M	docs/Key.md	\|	2	--
M	docs/KeyframeAnimation.md	\|	2	--
M	docs/Keyless.md	\|	2	--
M	docs/KnowledgeBaseApproach.md	\|	2	--
M	docs/L1Norm.md	\|	2	--
M	docs/L2Norm.md	\|	2	--
M	docs/LCM.md	\|	2	--
M	docs/LLE.md	\|	2	--
M	docs/LUDecomposition.md	\|	2	--
M	docs/LamportSignature.md	\|	1	-
M	docs/Language.md	\|	2	--
M	docs/LassoRegression.md	\|	2	--
M	docs/LawOfCosines.md	\|	2	--
M	docs/LawOfDetachment.md	\|	2	--
M	docs/LeakyReLU.md	\|	2	--
M	docs/LearningRate.md	\|	3	---
M	docs/LexicographicOrdering.md	\|	2	--
M	docs/Lighting.md	\|	2	--
M	docs/LinearCombination.md	\|	2	--
M	docs/LinearCongruence.md	\|	2	--
M	docs/LinearEquations.md	\|	2	--
M	docs/LinearHomogeneousRecurrenceRelation.md	\|	2	--
M	docs/LinearIndependence.md	\|	4	----
M	docs/LinearMaps.md	\|	2	--
M	docs/LinearProbing.md	\|	2	--
M	docs/LinearRegression.md	\|	2	--
M	docs/LinearSubspace.md	\|	2	--
M	docs/LinearTransformation.md	\|	2	--
M	docs/Linearithmic.md	\|	2	--
M	docs/LoadFactor.md	\|	2	--
M	docs/LocalScale.md	\|	2	--
M	docs/LogarithmicDifferentiation.md	\|	2	--
M	docs/Loop.md	\|	2	--
M	docs/LoopInvariant.md	\|	2	--
M	docs/LossFunction.md	\|	2	--
M	docs/Lvalue.md	\|	2	--
M	docs/MAE.md	\|	2	--
M	docs/MCTS.md	\|	2	--
M	docs/MLP.md	\|	2	--
M	docs/MUX.md	\|	2	--
M	docs/ManifoldLearning.md	\|	2	--
M	docs/MarginalProbabilities.md	\|	2	--
M	docs/MarkovAssumption.md	\|	2	--
M	docs/MarkovChains.md	\|	1	-
M	docs/MarkovDecisionProcesses.md	\|	2	--
M	docs/MarkovInequality.md	\|	2	--
M	docs/MarkovRewardProcess.md	\|	2	--
M	docs/MathConceptsCS331.md	\|	2	--
M	docs/Matrix.md	\|	3	---
M	docs/MaxNorm.md	\|	2	--
M	docs/MaxNormRegularization.md	\|	2	--
M	docs/MaxPooling.md	\|	2	--
M	docs/Memory.md	\|	2	--
M	docs/MemoryManagement.md	\|	4	----
M	docs/MergeSort.md	\|	2	--
M	docs/MersennePrime.md	\|	2	--
M	docs/Mesh.md	\|	2	--
M	docs/MeshFilter.md	\|	2	--
M	docs/MeshRenderer.md	\|	2	--
M	docs/MicroArchitecture.md	\|	2	--
M	docs/Microcontroller.md	\|	2	--
M	docs/Microprocessor.md	\|	2	--
M	docs/MillerRabinAlgorithm.md	\|	2	--
M	docs/MinMaxScaling.md	\|	2	--
M	docs/MinusOneTrick.md	\|	2	--
M	docs/MixedGraph.md	\|	2	--
M	docs/Mod.md	\|	2	--
M	docs/Model.md	\|	2	--
M	docs/ModelBasedLearning.md	\|	2	--
M	docs/ModelFree.md	\|	2	--
M	docs/Momentum.md	\|	2	--
M	docs/MonoBehaviour.md	\|	2	--
M	docs/MonotonicFunction.md	\|	2	--
M	docs/MonteCarloLearning.md	\|	2	--
M	docs/MonteCarloMethod.md	\|	2	--
M	docs/MooresLaw.md	\|	2	--
M	docs/MosaicPlot.md	\|	2	--
M	docs/Movement.md	\|	3	---
M	docs/MultiValuedFunction.md	\|	2	--
M	docs/Multigraph.md	\|	2	--
M	docs/MultinomialCoefficient.md	\|	2	--
M	docs/MultioutputClassification.md	\|	2	--
M	docs/Multiset.md	\|	2	--
M	docs/MutuallyIndependent.md	\|	2	--
M	docs/NAG.md	\|	2	--
M	docs/NLP.md	\|	2	--
M	docs/NPComplete.md	\|	2	--
M	docs/NPProblem.md	\|	2	--
M	docs/NaiveBayes.md	\|	2	--
M	docs/NaturalLog.md	\|	3	---
M	docs/Negation.md	\|	2	--
M	docs/NestedQuantifier.md	\|	2	--
M	docs/NetworkSecurity.md	\|	2	--
M	docs/NeuralNetworks.md	\|	2	--
M	docs/NonDeterministicFiniteAutomata.md	\|	1	-
M	docs/NonRepudation.md	\|	2	--
M	docs/Norm.md	\|	2	--
M	docs/NormalDistribution.md	\|	2	--
M	docs/NormalVector.md	\|	2	--
M	docs/NoveltyDetection.md	\|	2	--
M	docs/NullSpace.md	\|	2	--
M	docs/NumberTheory.md	\|	2	--
M	docs/OSI.md	\|	2	--
M	docs/OffPolicyLearning.md	\|	2	--
M	docs/OneHotEncoding.md	\|	2	--
M	docs/OneVersusAll.md	\|	2	--
M	docs/OneVersusOne.md	\|	2	--
M	docs/OnesComplement.md	\|	2	--
M	docs/OnlineLearning.md	\|	2	--
M	docs/Opcode.md	\|	2	--
M	docs/OpenAddressing.md	\|	2	--
M	docs/Operands.md	\|	2	--
M	docs/OperatorNotation.md	\|	2	--
M	docs/OptimalBayesianAgent.md	\|	2	--
M	docs/OptimalSubstructure.md	\|	2	--
M	docs/Optimizer.md	\|	2	--
M	docs/OracleComputer.md	\|	2	--
M	docs/OrderedSample.md	\|	2	--
M	docs/OrdinaryLeastSquares.md	\|	2	--
M	docs/OrthogonalComplement.md	\|	2	--
M	docs/Orthonormal.md	\|	2	--
M	docs/OutOfBag.md	\|	2	--
M	docs/OutOfOrderExecution.md	\|	2	--
M	docs/Overfitting.md	\|	2	--
M	docs/OverlappingSubproblems.md	\|	2	--
M	docs/Oversmoothing.md	\|	2	--
M	docs/PCA.md	\|	2	--
M	docs/PairwiseIndependence.md	\|	2	--
M	docs/PairwiseRelativelyPrime.md	\|	2	--
M	docs/PartialDerivative.md	\|	2	--
M	docs/PartiallyApplied.md	\|	1	-
M	docs/PartiallyObservableMarkovDecisionProcess.md	\|	2	--
M	docs/ParticularSolution.md	\|	2	--
M	docs/Partition.md	\|	2	--
M	docs/PascalsIdentity.md	\|	2	--
M	docs/PassiveAttacks.md	\|	2	--
M	docs/Pasting.md	\|	2	--
M	docs/Path.md	\|	2	--
M	docs/Percentile.md	\|	2	--
M	docs/Perceptrons.md	\|	2	--
M	docs/PerfectNumbers.md	\|	2	--
M	docs/PeriodicChain.md	\|	2	--
M	docs/PerlinNoise.md	\|	2	--
M	docs/Permutation.md	\|	2	--
M	docs/PermutationMatrix.md	\|	2	--
M	docs/Pictograph.md	\|	2	--
M	docs/PigeonholePrinciple.md	\|	2	--
M	docs/Pipelining.md	\|	2	--
M	docs/PlaneToPlaneDistance.md	\|	2	--
M	docs/PoissonDistribution.md	\|	2	--
M	docs/PolarCoordinates.md	\|	3	---
M	docs/Policy.md	\|	2	--
M	docs/PoolingLayers.md	\|	2	--
M	docs/Postcondition.md	\|	2	--
M	docs/PowerSet.md	\|	2	--
M	docs/Precision.md	\|	2	--
M	docs/Preconditions.md	\|	2	--
M	docs/Predicate.md	\|	2	--
M	docs/Prediction.md	\|	2	--
M	docs/Preimage.md	\|	2	--
M	docs/PretrainedModels.md	\|	2	--
M	docs/PrimeFactorization.md	\|	2	--
M	docs/PrimeNumber.md	\|	2	--
M	docs/PrincipleOfInclusionExclusion.md	\|	2	--
M	docs/Probability.md	\|	2	--
M	docs/ProbabilityDensityFunctions.md	\|	2	--
M	docs/ProbabilityMassFunction.md	\|	2	--
M	docs/ProductRule.md	\|	2	--
M	docs/Prognosticator.md	\|	2	--
M	docs/ProgrammerVisibleState.md	\|	2	--
M	docs/PropertyBasedTesting.md	\|	1	-
M	docs/Proposition.md	\|	2	--
M	docs/PropositionalFunction.md	\|	2	--
M	docs/ProveSetEquality.md	\|	2	--
M	docs/PseudoGraphs.md	\|	2	--
M	docs/QuadraticProbing.md	\|	3	---
M	docs/Quantifiers.md	\|	2	--
M	docs/Quantile.md	\|	2	--
M	docs/Quaternions.md	\|	2	--
M	docs/Queue.md	\|	2	--
M	docs/RCombination.md	\|	2	--
M	docs/RMSE.md	\|	2	--
M	docs/ROC.md	\|	2	--
M	docs/RPermutation.md	\|	2	--
M	docs/RadialBasisFunction.md	\|	2	--
M	docs/RamseyNumbers.md	\|	2	--
M	docs/RandomExperiment.md	\|	2	--
M	docs/RandomForest.md	\|	2	--
M	docs/RandomPatches.md	\|	2	--
M	docs/RandomProjection.md	\|	2	--
M	docs/RandomSubspaces.md	\|	2	--
M	docs/RandomVariables.md	\|	2	--
M	docs/Range.md	\|	2	--
M	docs/Rank.md	\|	2	--
M	docs/RealVectorSpace.md	\|	2	--
M	docs/RecencyHeuristic.md	\|	2	--
M	docs/RecurrenceRelation.md	\|	2	--
M	docs/ReducedRowEchelonForm.md	\|	3	---
M	docs/Reflexive.md	\|	2	--
M	docs/ReflexiveClosure.md	\|	2	--
M	docs/RegressionProblem.md	\|	2	--
M	docs/RegressionToTheMean.md	\|	2	--
M	docs/Relation.md	\|	2	--
M	docs/RelationOnASet.md	\|	2	--
M	docs/RelativeFrequency.md	\|	2	--
M	docs/RelativelyPrime.md	\|	2	--
M	docs/Representative.md	\|	2	--
M	docs/Return.md	\|	2	--
M	docs/RewardSignal.md	\|	2	--
M	docs/RidgeRegression.md	\|	2	--
M	docs/RightHandRule.md	\|	2	--
M	docs/Rotate.md	\|	2	--
M	docs/Rotation.md	\|	2	--
M	docs/RowBuffer.md	\|	2	--
M	docs/RowEchelonForm.md	\|	2	--
M	docs/RuleLearning.md	\|	3	---
M	docs/RuleOfSarrus.md	\|	3	---
M	docs/Rvalue.md	\|	2	--
M	docs/SMOTE.md	\|	2	--
M	docs/SVM.md	\|	3	---
M	docs/SampleSpace.md	\|	2	--
M	docs/Satisfiable.md	\|	2	--
M	docs/Script.md	\|	2	--
M	docs/Segmentation.md	\|	2	--
M	docs/SelfSupervisedLearning.md	\|	2	--
M	docs/SemiSupervisedLearning.md	\|	2	--
M	docs/SentinelValue.md	\|	2	--
M	docs/Sequence.md	\|	2	--
M	docs/Set.md	\|	2	--
M	docs/SetFunction.md	\|	2	--
M	docs/SharedPointers.md	\|	3	---
M	docs/Shear.md	\|	2	--
M	docs/SignedExtension.md	\|	3	---
M	docs/SimilarityFeature.md	\|	2	--
M	docs/SimpsonsParadox.md	\|	2	--
M	docs/SingleKey.md	\|	2	--
M	docs/SinglyLinkedList.md	\|	3	---
M	docs/SkeletalAnimation.md	\|	2	--
M	docs/SmallestCounterExample.md	\|	3	---
M	docs/Span.md	\|	2	--
M	docs/Stack.md	\|	2	--
M	docs/Stacking.md	\|	2	--
M	docs/StandardDeviation.md	\|	2	--
M	docs/StandardMatrix.md	\|	2	--
M	docs/Standardization.md	\|	2	--
M	docs/StateAnalysis.md	\|	2	--
M	docs/StatisticalInference.md	\|	2	--
M	docs/StemAndLeafPlot.md	\|	2	--
M	docs/StirlingsFormula.md	\|	2	--
M	docs/StochasticAlgorithm.md	\|	2	--
M	docs/StratifiedSampling.md	\|	2	--
M	docs/String.md	\|	2	--
M	docs/StrongAI.md	\|	2	--
M	docs/StrongInduction.md	\|	2	--
M	docs/Subgraph.md	\|	2	--
M	docs/Subsequence.md	\|	2	--
M	docs/Subset.md	\|	2	--
M	docs/Subspace.md	\|	2	--
M	docs/SubtractionRule.md	\|	2	--
M	docs/SumOfGeometricSeries.md	\|	2	--
M	docs/SumOfVectorSpaces.md	\|	2	--
M	docs/SumRule.md	\|	2	--
M	docs/SuperScalar.md	\|	2	--
M	docs/SupervisedLearning.md	\|	2	--
M	docs/SupportVectorMachine.md	\|	2	--
M	docs/SurfaceRepresentation.md	\|	2	--
M	docs/Surjective.md	\|	2	--
M	docs/SymmetricClosure.md	\|	2	--
M	docs/SymmetricMatrix.md	\|	2	--
M	docs/SystemsOfEquations.md	\|	2	--
M	docs/TF-IDF.md	\|	1	-
M	docs/TargetEncoding.md	\|	1	-
M	docs/Task.md	\|	2	--
M	docs/Tautology.md	\|	2	--
M	docs/Tensor.md	\|	2	--
M	docs/Texture.md	\|	2	--
M	docs/TextureMaps.md	\|	1	-
M	docs/TheRightTimeToLearn.md	\|	1	-
M	docs/TimeComplexity.md	\|	2	--
M	docs/Tractable.md	\|	2	--
M	docs/TransferLearning.md	\|	2	--
M	docs/Transformations.md	\|	2	--
M	docs/Transitive.md	\|	2	--
M	docs/TransitiveClosure.md	\|	2	--
M	docs/Translate.md	\|	2	--
M	docs/Transpose.md	\|	1	-
M	docs/Tree.md	\|	1	-
M	docs/TreeDiagram.md	\|	2	--
M	docs/Trichotomy.md	\|	2	--
M	docs/TripleProductExpansion.md	\|	2	--
M	docs/TruePositiveRate.md	\|	2	--
M	docs/Trust.md	\|	2	--
M	docs/TruthSet.md	\|	2	--
M	docs/TwoKey.md	\|	2	--
M	docs/TwosComplement.md	\|	2	--
M	docs/UVMaps.md	\|	2	--
M	docs/UnaryOperations.md	\|	2	--
M	docs/Underfitting.md	\|	2	--
M	docs/Undersmoothing.md	\|	2	--
M	docs/Unicode.md	\|	2	--
M	docs/UniquePointers.md	\|	2	--
M	docs/UnitVector.md	\|	2	--
M	docs/Unity.md	\|	2	--
M	docs/UniversalSet.md	\|	2	--
M	docs/Universe.md	\|	2	--
M	docs/Unsolvable.md	\|	2	--
M	docs/UnstableGradients.md	\|	2	--
M	docs/UnsupervisedPretraining.md	\|	2	--
M	docs/VacuousProof.md	\|	2	--
M	docs/ValueFunction.md	\|	2	--
M	docs/VandermondesIdentity.md	\|	2	--
M	docs/VanishingGradients.md	\|	2	--
M	docs/Variables.md	\|	2	--
M	docs/VariadicOperations.md	\|	2	--
M	docs/Vector.md	\|	3	---
M	docs/Vector3.md	\|	2	--
M	docs/VectorMatrixMultipication.md	\|	4	----
M	docs/VectorSpace.md	\|	2	--
M	docs/Vertex.md	\|	2	--
M	docs/VigenereCipher.md	\|	2	--
M	docs/VisualizationAlgorithm.md	\|	2	--
M	docs/VonNeumannModel.md	\|	2	--
M	docs/VotingClassifiers.md	\|	2	--
M	docs/WeakAI.md	\|	2	--
M	docs/Weight.md	\|	2	--
M	docs/WeightedGraph.md	\|	2	--
M	docs/WellOrdered.md	\|	2	--
M	docs/WideAndDeepNN.md	\|	2	--
M	docs/Word.md	\|	2	--
M	docs/ZeroExtension.md	\|	2	--
M	docs/ZeroOneMatrix.md	\|	2	--
M	docs/rsync.md	\|	2	--
M	docs/usubstitution.md	\|	2	--