Took dl and la notes - notes - Unnamed repository; edit this file 'description' to name the repository.

commit 54ba1fcd1d143b14cd0cfd62f2e96825fbce003b
parent b5e081419b98e034267e85cce86e3b7e8e6fe359
Author: AndrewLockVI <andrewlaack1@gmail.com>
Date:   Mon, 20 Jan 2025 07:21:41 -0600

Took dl and la notes

Diffstat:
D AISafety.md  | 35 -----------------------------------
D AbstractDataType.md  | 9 ---------
D Abstraction.md  | 9 ---------
D Accuracy.md  | 9 ---------
D AdaBoost.md  | 9 ---------
D AdaGrad.md  | 9 ---------
D Adam.md  | 11 -----------
D Adder.md  | 0 
D AdjacencyMatrix.md  | 9 ---------
D Affinity.md  | 9 ---------
D Algorithm.md  | 13 -------------
D Algorithms.md  | 157 -------------------------------------------------------------------------------
D AmbientSpace.md  | 9 ---------
D Amortization.md  | 7 -------
D AngleBetweenVectors.md  | 17 -----------------
D Animation.md  | 21 ---------------------
D AnimationController.md  | 12 ------------
D AnomalyDetection.md  | 13 -------------
D Antisymmetric.md  | 7 -------
D Arccos.md  | 7 -------
D Arcsin.md  | 7 -------
D ArithmeticComputations.md  | 7 -------
D Armature.md  | 10 ----------
D Ascii.md  | 7 -------
D Assembly.md  | 129 -------------------------------------------------------------------------------
D Asset.md  | 9 ---------
D Associative.md  | 11 -----------
D AstronomicalUnit.md  | 14 --------------
D AsymptoticNotation.md  | 39 ---------------------------------------
D Autoencoder.md  | 13 -------------
D BCD.md  | 16 ----------------
D BIOL115.md  | 15 ---------------
D Backpropagation.md  | 17 -----------------
D Bagging.md  | 9 ---------
D Baking.md  | 16 ----------------
D Bandits.md  | 11 -----------
D Bandwidth.md  | 9 ---------
D BarrierSynchronization.md  | 17 -----------------
D BasicVariables.md  | 7 -------
D BasisOfSubspace.md  | 7 -------
D BatchNormalization.md  | 9 ---------
D BayesTheroem.md  | 15 ---------------
D BayesianInference.md  | 9 ---------
D BekensteinBound.md  | 9 ---------
D BellmanEquation.md  | 9 ---------
D BernoulliProcess.md  | 23 -----------------------
D BernoulliRandomVariable.md  | 9 ---------
D Bias.md  | 18 ------------------
D Biconditional.md  | 18 ------------------
D BigONotation.md  | 9 ---------
D BigThetaNotation.md  | 7 -------
D Bijective.md  | 11 -----------
D BijectiveProof.md  | 18 ------------------
D BinaryCode.md  | 35 -----------------------------------
D BinaryOperations.md  | 9 ---------
D BinaryTree.md  | 19 -------------------
D Binomial.md  | 15 ---------------
D BinomialCoefficient.md  | 33 ---------------------------------
D BinomialDistribution.md  | 11 -----------
D Bipartite.md  | 11 -----------
D BitSteering.md  | 11 -----------
D Blender.md  | 23 -----------------------
D BlenderShortcuts.md  | 39 ---------------------------------------
D Boosting.md  | 15 ---------------
D Boxplots.md  | 9 ---------
D BreadthFirstSearch.md  | 9 ---------
D BucketAddressing.md  | 7 -------
D BulkSynchronousProcessing.md  | 11 -----------
D CART.md  | 13 -------------
D CNN.md  | 53 -----------------------------------------------------
D CPP.md  | 14 --------------
D CS202.md  | 21 ---------------------
D CS331.md  | 23 -----------------------
D Cache.md  | 5 -----
D CaesarCipher.md  | 7 -------
D Calculus.md  | 53 -----------------------------------------------------
D CanaryValue.md  | 9 ---------
D CartesianProduct.md  | 11 -----------
D Cases.md  | 7 -------
D CategoricalCrossEntropy.md  | 13 -------------
D Ceiling.md  | 11 -----------
D CentralLimitTheroem.md  | 7 -------
D ChainRule.md  | 7 -------
D Chaining.md  | 7 -------
D ChangeOfBasis.md  | 29 -----------------------------
D CharacteristicEquation.md  | 13 -------------
D CharacteristicRoots.md  | 7 -------
D CircuitTechnology.md  | 7 -------
D CircularDoublyLinkedList.md  | 8 --------
D CircularLinkedList.md  | 8 --------
D ClassificationProblem.md  | 11 -----------
D Clip.md  | 11 -----------
D Closure.md  | 17 -----------------
D ClusteringAlgorithms.md  | 15 ---------------
D Codeword.md  | 13 -------------
D Codomain.md  | 15 ---------------
D Collection.md  | 7 -------
D Collision.md  | 7 -------
D ColumnSpace.md  | 9 ---------
D Combination.md  | 9 ---------
D CombinatorialProof.md  | 15 ---------------
D Combinatorics.md  | 9 ---------
D Commutative.md  | 13 -------------
D Complement.md  | 9 ---------
D ComplexVectorSpace.md  | 7 -------
D CompositeNumber.md  | 7 -------
D ComputerArchitecture.md  | 44 --------------------------------------------
D ComputerSecurity.md  | 11 -----------
D ConditionalDisjunction.md  | 7 -------
D ConditionalProbabilities.md  | 9 ---------
D ConditionalProbability.md  | 9 ---------
D ConditionalProbabilityTheroem.md  | 13 -------------
D ConfusionMatrix.md  | 7 -------
D Congruence.md  | 7 -------
D CongruenceClass.md  | 7 -------
D Connected.md  | 7 -------
D ConnectedComponent.md  | 7 -------
D Connectives.md  | 28 ----------------------------
D Contingency.md  | 9 ---------
D ContinuousProbability.md  | 11 -----------
D Contradiction.md  | 9 ---------
D Contrapositive.md  | 11 -----------
D Converse.md  | 9 ---------
D Coordinate.md  | 11 -----------
D Correlation.md  | 9 ---------
D CorrelationCoefficient.md  | 9 ---------
D CountSort.md  | 7 -------
D CounterExample.md  | 7 -------
D CountingPrinciple.md  | 7 -------
D Covariance.md  | 13 -------------
D CramersRule.md  | 11 -----------
D CreditAssignmentProblem.md  | 7 -------
D CriticalPath.md  | 0 
D CrossProduct.md  | 24 ------------------------
D CrossValidation.md  | 9 ---------
D Crosstabulation.md  | 17 -----------------
D CumulativeDensityFunction.md  | 19 -------------------
D CumulativeRelativeFrequency.md  | 7 -------
D Cycle.md  | 7 -------
D DBSCAN.md  | 13 -------------
D DRAM.md  | 12 ------------
D DRAMBanks.md  | 5 -----
D DRAMCell.md  | 7 -------
D DRAMChips.md  | 5 -----
D DRAMRefresh.md  | 11 -----------
D DRAMRowHammer.md  | 6 ------
D DataAugmentation.md  | 9 ---------
D DataFlow.md  | 14 --------------
D DataStructureAugmentation.md  | 9 ---------
D DecisionThreshold.md  | 9 ---------
D DecisionTrees.md  | 64 ----------------------------------------------------------------
D DeepLearning.md  | 21 ---------------------
D Degree.md  | 7 -------
D DemorgansLaw.md  | 43 -------------------------------------------
D DensityEstimation.md  | 11 -----------
D DepthFirstSearch.md  | 11 -----------
D DerivedDistribution.md  | 9 ---------
D DesignPoint.md  | 16 ----------------
D Determinant.md  | 148 -------------------------------------------------------------------------------
D DeterministicFiniteAutomata.md  | 21 ---------------------
D DiagonalMatrices.md  | 9 ---------
D Digraph.md  | 9 ---------
D DimensionalityReduction.md  | 9 ---------
D Dimensions.md  | 11 -----------
D DirectProof.md  | 7 -------
D DirectSum.md  | 11 -----------
D DiscountFactor.md  | 9 ---------
D DiscreteMath.md  | 288 -------------------------------------------------------------------------------
D DiscreteProbability.md  | 7 -------
D DiscreteRandomVariable.md  | 7 -------
D DiscreteUniformLaw.md  | 7 -------
D DisjointSet.md  | 7 -------
D DistanceCalculation.md  | 20 --------------------
D DistanceToPlane.md  | 15 ---------------
D Distinguishable.md  | 7 -------
D DistinguishablePermutation.md  | 7 -------
D Distributive.md  | 7 -------
D DistributiveLaw.md  | 11 -----------
D DisturbanceErrors.md  | 7 -------
D Div.md  | 13 -------------
D DivideAndConquer.md  | 15 ---------------
D DivisionRule.md  | 9 ---------
D DotProduct.md  | 30 ------------------------------
D DoublyLinkedList.md  | 8 --------
D Dropout.md  | 13 -------------
D Duality.md  | 9 ---------
D DynamicProgramming.md  | 12 ------------
D EarlyStopping.md  | 11 -----------
D EigenVector.md  | 81 -------------------------------------------------------------------------------
D ElasticNetRegression.md  | 9 ---------
D ElementaryTransformations.md  | 9 ---------
D EligibilityTraces.md  | 9 ---------
D Embedding.md  | 11 -----------
D EmptyGraph.md  | 7 -------
D Ensembles.md  | 7 -------
D Entropy.md  | 15 ---------------
D Episode.md  | 7 -------
D Episodic.md  | 7 -------
D EquationOfAPlane.md  | 43 -------------------------------------------
D EquivalenceClass.md  | 9 ---------
D EquivalenceRelation.md  | 7 -------
D EuclideanAlgorithm.md  | 9 ---------
D Evaluation.md  | 7 -------
D Event.md  | 9 ---------
D EvolutionaryMethods.md  | 7 -------
D ExhaustiveProof.md  | 9 ---------
D Expectation.md  | 17 -----------------
D ExplodingGradients.md  | 17 -----------------
D Exploit.md  | 9 ---------
D ExploratoryDataAnalysis.md  | 7 -------
D Explore.md  | 7 -------
D ExponentialDistribution.md  | 9 ---------
D ExtraTrees.md  | 9 ---------
D Feature.md  | 11 -----------
D FeatureScaling.md  | 11 -----------
D FibonacciNumbers.md  | 9 ---------
D FiniteDimensional.md  | 11 -----------
D FiniteField.md  | 7 -------
D FiniteStateAutomata.md  | 9 ---------
D FisherYatesShuffle.md  | 27 ---------------------------
D FlashCrash.md  | 7 -------
D Floor.md  | 11 -----------
D Folding.md  | 7 -------
D ForwardThoughts.md  | 23 -----------------------
D FreeVariables.md  | 9 ---------
D Frequency.md  | 9 ---------
D FrequencyHeuristic.md  | 11 -----------
D FunctionNotation.md  | 7 -------
D FundamentalOperations.md  | 15 ---------------
D FundamentalTheoremOfArithmetic.md  | 13 -------------
D FundamentalTheroemofCalculus.md  | 9 ---------
D GCD.md  | 9 ---------
D GameLoop.md  | 13 -------------
D GameObject.md  | 17 -----------------
D GaussianElimination.md  | 11 -----------
D GaussianIntegers.md  | 7 -------
D GaussianMixtureModels.md  | 7 -------
D GeneralSolution.md  | 7 -------
D GeneralizationError.md  | 11 -----------
D GeneralizedPigeonholePrinciple.md  | 7 -------
D GradientBoosting.md  | 10 ----------
D GradientClipping.md  | 13 -------------
D GradientDescent.md  | 21 ---------------------
D GradientDescentCode.md  | 58 ----------------------------------------------------------
D GramSchmidtProcess.md  | 9 ---------
D Graphs.md  | 11 -----------
D HHP102.md  | 8 --------
D HadamardProduct.md  | 9 ---------
D HalfWord.md  | 7 -------
D Hamming.md  | 11 -----------
D HarmonicMean.md  | 19 -------------------
D HashFunction.md  | 40 ----------------------------------------
D HashTable.md  | 9 ---------
D HashValues.md  | 9 ---------
D Hashing.md  | 7 -------
D HasseDiagram.md  | 9 ---------
D HistogramBasedGradientBoosting.md  | 11 -----------
D HistoricalDesigns.md  | 7 -------
D Homogeneous.md  | 15 ---------------
D Hyperparameter.md  | 9 ---------
D Hyperplane.md  | 7 -------
D Hypervolume.md  | 7 -------
D IPD.md  | 8 --------
D IQR.md  | 9 ---------
D ISA.md  | 52 ----------------------------------------------------
D IdentityMatrix.md  | 20 --------------------
D Image.md  | 27 ---------------------------
D ImitationLearning.md  | 9 ---------
D Imputation.md  | 24 ------------------------
D Incremental.md  | 9 ---------
D IncrementalMean.md  | 45 ---------------------------------------------
D Independence.md  | 18 ------------------
D IndependentEvents.md  | 9 ---------
D Indistinguishable.md  | 7 -------
D Individuals.md  | 7 -------
D Induction.md  | 37 -------------------------------------
D Inertia.md  | 9 ---------
D Inference.md  | 9 ---------
D InformationContent.md  | 7 -------
D Inhomogeneous.md  | 9 ---------
D Injective.md  | 11 -----------
D Input.md  | 14 --------------
D InsertionSort.md  | 41 -----------------------------------------
D InstanceBasedLearning.md  | 13 -------------
D Instruction.md  | 21 ---------------------
D IntegerOverflow.md  | 7 -------
D IntelligenceExplosion.md  | 7 -------
D Intractable.md  | 9 ---------
D Invariance.md  | 9 ---------
D Inverse.md  | 9 ---------
D InverseFunction.md  | 36 ------------------------------------
D InverseTransformation.md  | 89 -------------------------------------------------------------------------------
D Invertible.md  | 7 -------
D IteratedExpectations.md  | 11 -----------
D Jerk.md  | 9 ---------
D JointDensityFunction.md  | 9 ---------
D JointProbability.md  | 17 -----------------
D KMeans.md  | 18 ------------------
D KNearestNeighbor.md  | 18 ------------------
D Kernel.md  | 11 -----------
D Key.md  | 9 ---------
D KeyframeAnimation.md  | 31 -------------------------------
D LCM.md  | 7 -------
D LLE.md  | 11 -----------
D LabelEncoding.md  | 13 -------------
D LasVegasMethod.md  | 7 -------
D LassoRegression.md  | 9 ---------
D LatentSpace.md  | 9 ---------
D LawOfCosines.md  | 9 ---------
D LawOfDetachment.md  | 11 -----------
D LawOfLargeNumbers.md  | 9 ---------
D LeakyReLU.md  | 22 ----------------------
D LearningRate.md  | 16 ----------------
D LexicographicOrdering.md  | 11 -----------
D Lighting.md  | 19 -------------------
D LinearAlgebra.md  | 113 -------------------------------------------------------------------------------
D LinearCombination.md  | 11 -----------
D LinearCongruence.md  | 7 -------
D LinearEquations.md  | 9 ---------
D LinearHomogeneousRecurrenceRelation.md  | 13 -------------
D LinearIndependence.md  | 72 ------------------------------------------------------------------------
D LinearMaps.md  | 12 ------------
D LinearProbing.md  | 9 ---------
D LinearRegression.md  | 32 --------------------------------
D LinearSubspace.md  | 21 ---------------------
D LinearTransformation.md  | 66 ------------------------------------------------------------------
D Linearithmic.md  | 7 -------
D LinkedLists.md  | 21 ---------------------
D LinuxStuff.md  | 8 --------
D LoadFactor.md  | 7 -------
D LocalScale.md  | 9 ---------
D LogarithmicDifferentiation.md  | 15 ---------------
D LogisticRegression.md  | 17 -----------------
D Loop.md  | 7 -------
D LoopInvariant.md  | 15 ---------------
D LossFunction.md  | 11 -----------
D Lvalue.md  | 15 ---------------
D MAE.md  | 27 ---------------------------
D MCTS.md  | 7 -------
D MLP.md  | 13 -------------
D MUX.md  | 7 -------
D MachineLearning.md  | 242 -------------------------------------------------------------------------------
D ManifoldLearning.md  | 11 -----------
D MarginalProbabilities.md  | 7 -------
D MarkovAssumption.md  | 7 -------
D MarkovChains.md  | 70 ----------------------------------------------------------------------
D MarkovDecisionProcesses.md  | 9 ---------
D MarkovInequality.md  | 7 -------
D MarkovProcess.md  | 7 -------
D MarkovRewardProcess.md  | 7 -------
D Math310.md  | 19 -------------------
D MathConceptsCS331.md  | 8 --------
D Matrix.md  | 28 ----------------------------
D MatrixMultiplication.md  | 17 -----------------
D MaxNormRegularization.md  | 7 -------
D MaxPooling.md  | 9 ---------
D Memory.md  | 17 -----------------
D MemoryManagement.md  | 44 --------------------------------------------
D MergeSort.md  | 46 ----------------------------------------------
D MersennePrime.md  | 9 ---------
D Mesh.md  | 18 ------------------
D MeshFilter.md  | 7 -------
D MeshRenderer.md  | 7 -------
D MicroArchitecture.md  | 15 ---------------
D Microcontroller.md  | 9 ---------
D Microprocessor.md  | 9 ---------
D MinMaxScaling.md  | 29 -----------------------------
D MinusOneTrick.md  | 9 ---------
D MixedGraph.md  | 7 -------
D MixedRandomVariable.md  | 11 -----------
D Mod.md  | 9 ---------
D Model.md  | 12 ------------
D ModelBasedLearning.md  | 9 ---------
D ModelFree.md  | 7 -------
D Momentum.md  | 13 -------------
D MonoBehaviour.md  | 13 -------------
D MonotonicFunction.md  | 9 ---------
D MonteCarloLearning.md  | 11 -----------
D MonteCarloMethod.md  | 11 -----------
D MooresLaw.md  | 11 -----------
D MosaicPlot.md  | 7 -------
D Movement.md  | 33 ---------------------------------
D MultiValuedFunction.md  | 9 ---------
D MulticlassClassifier.md  | 7 -------
D Multigraph.md  | 7 -------
D MultilabelClassification.md  | 9 ---------
D MultinomialCoefficient.md  | 14 --------------
D MultioutputClassification.md  | 7 -------
D Multiset.md  | 23 -----------------------
D MutuallyIndependent.md  | 7 -------
D NAG.md  | 7 -------
D NLP.md  | 7 -------
D NLU.md  | 7 -------
D NPComplete.md  | 7 -------
D NPProblem.md  | 7 -------
D NaiveBayes.md  | 16 ----------------
D NaryOperations.md  | 7 -------
D NaturalLog.md  | 29 -----------------------------
D Negation.md  | 7 -------
D NestedQuantifier.md  | 11 -----------
D NeuralNetworks.md  | 27 ---------------------------
D NonDeterministicFiniteAutomata.md  | 17 -----------------
D NormalDistribution.md  | 9 ---------
D NormalVector.md  | 7 -------
D NoveltyDetection.md  | 9 ---------
D NullSpace.md  | 17 -----------------
D Nullity.md  | 9 ---------
D NumberTheory.md  | 7 -------
D OffPolicyLearning.md  | 7 -------
D OfflineLearning.md  | 10 ----------
D OnPolicyLearning.md  | 9 ---------
D OneHotEncoding.md  | 11 -----------
D OneVersusAll.md  | 11 -----------
D OneVersusOne.md  | 13 -------------
D OnesComplement.md  | 8 --------
D OnlineLearning.md  | 14 --------------
D Opcode.md  | 9 ---------
D OpenAddressing.md  | 7 -------
D Operands.md  | 9 ---------
D OperatorNotation.md  | 7 -------
D OptimalBayesianAgent.md  | 7 -------
D OptimalSubstructure.md  | 7 -------
D Optimizer.md  | 14 --------------
D OracleComputer.md  | 9 ---------
D OrderedSample.md  | 7 -------
D OrdinaryLeastSquares.md  | 9 ---------
D OrthogonalComplement.md  | 21 ---------------------
D Orthonormal.md  | 9 ---------
D OutOfBag.md  | 22 ----------------------
D OutOfOrderExecution.md  | 9 ---------
D Overfitting.md  | 15 ---------------
D OverlappingSubproblems.md  | 7 -------
D Oversmooothing.md  | 9 ---------
D PCA.md  | 15 ---------------
D PProblem.md  | 7 -------
D PairwiseIndependence.md  | 7 -------
D PairwiseRelativelyPrime.md  | 7 -------
D PartialDerivative.md  | 9 ---------
D PartiallyObservableMarkovDecisionProcess.md  | 7 -------
D PartiallyOrderedSet.md  | 9 ---------
D ParticularSolution.md  | 7 -------
D Partition.md  | 11 -----------
D PascalsIdentity.md  | 7 -------
D Pasting.md  | 7 -------
D Path.md  | 7 -------
D Percentile.md  | 7 -------
D Perceptrons.md  | 17 -----------------
D PerfectNumbers.md  | 13 -------------
D PeriodicChain.md  | 15 ---------------
D PerlinNoise.md  | 11 -----------
D Permutation.md  | 13 -------------
D Physics.md  | 11 -----------
D Pictograph.md  | 7 -------
D PigeonholePrinciple.md  | 7 -------
D PipelineControl.md  | 7 -------
D Pipelining.md  | 9 ---------
D PlaneToPlaneDistance.md  | 17 -----------------
D PoissonDistribution.md  | 17 -----------------
D PoissonProcess.md  | 15 ---------------
D Pole.md  | 15 ---------------
D Policy.md  | 9 ---------
D PoolingLayers.md  | 9 ---------
D Postcondition.md  | 7 -------
D PosteriorProbability.md  | 7 -------
D PowerSet.md  | 21 ---------------------
D Precision.md  | 13 -------------
D Preconditions.md  | 7 -------
D Predicate.md  | 23 -----------------------
D Prediction.md  | 9 ---------
D Preimage.md  | 19 -------------------
D PretrainedModels.md  | 13 -------------
D PrimeFactorization.md  | 13 -------------
D PrimeNumber.md  | 9 ---------
D PrincipleOfInclusionExclusion.md  | 19 -------------------
D PriorProbability.md  | 7 -------
D Probability.md  | 19 -------------------
D ProbabilityDensityFunctions.md  | 23 -----------------------
D ProbabilityLaw.md  | 13 -------------
D ProbabilityMassFunction.md  | 52 ----------------------------------------------------
D ProbingFunction.md  | 9 ---------
D ProductRule.md  | 7 -------
D Prognosticator.md  | 7 -------
D ProgrammerVisibleState.md  | 15 ---------------
D Projection.md  | 128 -------------------------------------------------------------------------------
D Proposition.md  | 7 -------
D PropositionalFunction.md  | 23 -----------------------
D ProveSetEquality.md  | 7 -------
D PseudoGraphs.md  | 7 -------
D QuadraticProbing.md  | 31 -------------------------------
D Quantifiers.md  | 40 ----------------------------------------
D Quantile.md  | 11 -----------
D Quaternions.md  | 11 -----------
D Queue.md  | 15 ---------------
D RCombination.md  | 11 -----------
D RMSE.md  | 33 ---------------------------------
D ROC.md  | 9 ---------
D RPermutation.md  | 13 -------------
D RadialBasisFunction.md  | 11 -----------
D RamseyNumbers.md  | 7 -------
D RandomExperiment.md  | 9 ---------
D RandomForest.md  | 11 -----------
D RandomPatches.md  | 11 -----------
D RandomProjection.md  | 31 -------------------------------
D RandomSubspaces.md  | 7 -------
D RandomVariables.md  | 24 ------------------------
D Range.md  | 11 -----------
D Rank.md  | 11 -----------
D RealVectorSpace.md  | 15 ---------------
D RecencyHeuristic.md  | 9 ---------
D RecurrenceRelation.md  | 13 -------------
D ReducedRowEchelonForm.md  | 10 ----------
D Reflexive.md  | 7 -------
D ReflexiveClosure.md  | 9 ---------
D RegressionProblem.md  | 19 -------------------
D RegressionToTheMean.md  | 7 -------
D RegularExpressions.md  | 9 ---------
D RegularLanguages.md  | 9 ---------
D ReinforcementLearning.md  | 58 ----------------------------------------------------------
D Relation.md  | 13 -------------
D RelationOnASet.md  | 11 -----------
D RelativeFrequency.md  | 9 ---------
D RelativelyPrime.md  | 7 -------
D RepresentationLearning.md  | 11 -----------
D Representative.md  | 9 ---------
D Return.md  | 7 -------
D RewardSignal.md  | 9 ---------
D RidgeRegression.md  | 9 ---------
D RightHandRule.md  | 11 -----------
D Rotate.md  | 11 -----------
D Rotation.md  | 17 -----------------
D RowBuffer.md  | 9 ---------
D RowEchelonForm.md  | 9 ---------
D RuleLearning.md  | 10 ----------
D RuleOfSarrus.md  | 30 ------------------------------
D Rvalue.md  | 16 ----------------
D SMART.md  | 20 --------------------
D SMOTE.md  | 9 ---------
D SRAM.md  | 0 
D SVM.md  | 25 -------------------------
D SampleSpace.md  | 12 ------------
D Satisfiable.md  | 9 ---------
D Scheduling.md  | 5 -----
D Script.md  | 8 --------
D Seam.md  | 0 
D Segmentation.md  | 13 -------------
D SelfSupervisedLearning.md  | 11 -----------
D SemiSupervisedLearning.md  | 9 ---------
D SentinelValue.md  | 22 ----------------------
D Sequence.md  | 25 -------------------------
D Set.md  | 31 -------------------------------
D SetFunction.md  | 9 ---------
D SharedPointers.md  | 63 ---------------------------------------------------------------
D Shear.md  | 21 ---------------------
D SignedExtension.md  | 12 ------------
D SimilarityFeature.md  | 9 ---------
D SimpsonsParadox.md  | 19 -------------------
D SinglyLinkedList.md  | 56 --------------------------------------------------------
D Singularity.md  | 7 -------
D SkeletalAnimation.md  | 27 ---------------------------
D SmallestCounterExample.md  | 15 ---------------
D SoftmaxRegression.md  | 9 ---------
D Span.md  | 11 -----------
D Sparse.md  | 11 -----------
D Stack.md  | 17 -----------------
D Stacking.md  | 13 -------------
D StandardBasis.md  | 9 ---------
D StandardDeviation.md  | 13 -------------
D StandardMatrix.md  | 7 -------
D Standardization.md  | 33 ---------------------------------
D StateAnalysis.md  | 25 -------------------------
D StatisticalInference.md  | 7 -------
D StatisticsAndProbability.md  | 160 -------------------------------------------------------------------------------
D StemAndLeafPlot.md  | 7 -------
D StirlingsFormula.md  | 7 -------
D StochasticAlgorithm.md  | 9 ---------
D StratifiedSampling.md  | 11 -----------
D String.md  | 9 ---------
D StrongAI.md  | 7 -------
D StrongInduction.md  | 21 ---------------------
D Subgraph.md  | 7 -------
D Subsequence.md  | 7 -------
D Subset.md  | 9 ---------
D Subspace.md  | 23 -----------------------
D SubtractionRule.md  | 7 -------
D SumOfGeometricSeries.md  | 13 -------------
D SumOfVectorSpaces.md  | 18 ------------------
D SumRule.md  | 13 -------------
D SuperScalar.md  | 7 -------
D SupervisedLearning.md  | 11 -----------
D SupportVectorMachine.md  | 7 -------
D SurfaceRepresentation.md  | 9 ---------
D Surjective.md  | 9 ---------
D Symmetric.md  | 9 ---------
D SymmetricClosure.md  | 7 -------
D SymmetricMatrix.md  | 9 ---------
D SystemsOfEquations.md  | 7 -------
D TargetEncoding.md  | 22 ----------------------
D Task.md  | 11 -----------
D Tautology.md  | 9 ---------
D TemporalDifferenceLearning.md  | 9 ---------
D Tensor.md  | 9 ---------
D Texture.md  | 13 -------------
D TextureMaps.md  | 9 ---------
D TheoryOfComputation.md  | 18 ------------------
D TimeComplexity.md  | 7 -------
D TotalProbabilityTheroem.md  | 11 -----------
D Tractable.md  | 9 ---------
D TransTheoreticalModel.md  | 25 -------------------------
D TransferLearning.md  | 9 ---------
D Transform.md  | 23 -----------------------
D Transformations.md  | 9 ---------
D Transitive.md  | 7 -------
D TransitiveClosure.md  | 9 ---------
D Translate.md  | 11 -----------
D Transpose.md  | 103 -------------------------------------------------------------------------------
D Tree.md  | 26 --------------------------
D TreeDiagram.md  | 9 ---------
D Triangulation.md  | 9 ---------
D Trichotomy.md  | 17 -----------------
D TripleProductExpansion.md  | 9 ---------
D TruePositiveRate.md  | 13 -------------
D TruthSet.md  | 7 -------
D Tuple.md  | 11 -----------
D TwosComplement.md  | 17 -----------------
D UVMaps.md  | 13 -------------
D UnaryOperations.md  | 9 ---------
D Underfitting.md  | 9 ---------
D Undersmoothing.md  | 7 -------
D Unicode.md  | 7 -------
D UniquePointers.md  | 40 ----------------------------------------
D UnitVector.md  | 36 ------------------------------------
D Unity.md  | 43 -------------------------------------------
D UniversalSet.md  | 7 -------
D Universe.md  | 13 -------------
D Unsolvable.md  | 9 ---------
D UnstableGradients.md  | 17 -----------------
D UnsupervisedLearning.md  | 13 -------------
D UnsupervisedPretraining.md  | 11 -----------
D UtilityFunction.md  | 7 -------
D VLIW.md  | 0 
D VacuousProof.md  | 7 -------
D ValueFunction.md  | 11 -----------
D VandermondesIdentity.md  | 9 ---------
D VanishingGradients.md  | 15 ---------------
D Variables.md  | 7 -------
D VariadicOperations.md  | 9 ---------
D Variance.md  | 29 -----------------------------
D Vector.md  | 53 -----------------------------------------------------
D Vector3.md  | 9 ---------
D VectorMatrixMultipication.md  | 30 ------------------------------
D VectorSpace.md  | 30 ------------------------------
D Vertex.md  | 7 -------
D VigenereCipher.md  | 7 -------
D VisualizationAlgorithm.md  | 7 -------
D VonNeumannModel.md  | 21 ---------------------
D VotingClassifiers.md  | 13 -------------
D Walk.md  | 7 -------
D WeakAI.md  | 7 -------
D Weight.md  | 9 ---------
D WeightedGraph.md  | 7 -------
D WellDefined.md  | 7 -------
D WellOrdered.md  | 13 -------------
D WideAndDeepNN.md  | 11 -----------
D Word.md  | 7 -------
D ZeroExtension.md  | 9 ---------
D ZeroOneMatrix.md  | 9 ---------
D index.md  | 30 ------------------------------
D rsync.md  | 19 -------------------
D sed.md  | 0 
D usubstitution.md  | 7 -------
A work/deep-learning/PolarCoordinatesConversion.py  | 24 ++++++++++++++++++++++++
A work/linear-algebra/01-20-2025.md  | 5 +++++

672 files changed, 29 insertions(+), 10351 deletions(-)
diff --git a/AISafety.md b/AISafety.md
@@ -1,35 +0,0 @@
-
-Links to AI Safety Notes
-
-## Questions To Answer
-
-What problems are falsifiable and consequently worth working on?
-
-Is destruction of lower beings a convergent goal?
-
-How might we solve the stop button paradox?
-
-How to define AGI?
-
-How to test for AGI?
-
-## Notes
-
-#### Things to Read
-
-* Vernor Vinge’s seminal essay
-
-
-
-
-
-#### Superintelligence - Nick Bostrom
-
-Ch. 1
-* [Singularity](Singularity.md)
-* [IntelligenceExplosion](IntelligenceExplosion.md)
-* [Prognosticator](Prognosticator.md)
-* [OptimalBayesianAgent](OptimalBayesianAgent.md)
-* [FlashCrash](FlashCrash.md) - Real world example of mis-specified utility function causing consequences
-* [StrongAI](StrongAI.md)
-* [WeakAI](WeakAI.md)
diff --git a/AbstractDataType.md b/AbstractDataType.md
@@ -1,9 +0,0 @@
-# Abstract Data Type (ADT)
-
-CS 202 L14
-
-## Notes
-
-**Definition:** An ADT is a datatype that specifies it's interfaces but not implementation. This is similar to the relationship between an [[ISA.md]] and [[MicroArchitecture.md]].
-
-These are a focus of CS 303 and include things such as [Stack](Stack.md) and [Queue](Queue.md).
diff --git a/Abstraction.md b/Abstraction.md
@@ -1,9 +0,0 @@
-# Abstraction
-
-Abstraction cpu architecture L1
-
-## Notes
-
-Abstraction hides away the implementation details to higher levels. You only see the interfaces provided to you. 
-
-There are instances where exposing lower level functions to higher can be useful. This can be seen when lower level instructions are shown to the compiler to allow better optimization. 
diff --git a/Accuracy.md b/Accuracy.md
@@ -1,9 +0,0 @@
-# Accuracy
-
-ML D2
-
-## Notes
-
-**Definition:** Accuracy in machine learning describes the overall correctness of a model. 
-
-This metric is the percentage of guesses that are accurate based on predictions and labels.
diff --git a/AdaBoost.md b/AdaBoost.md
@@ -1,9 +0,0 @@
-# AdaBoost (adaptive boosting)
-
-ML D5
-
-## Notes
-
-**Definition:** Adaboost is a boosting algorithm that boosts training instances that the prior model underfit (missed). 
-
-In adaboosting each predictor gets a model weight based on how accurate it is generally then each instance weight is also updated based on the accuracy of the models prediction. When models are wrong more often their weight is lowered but when instances are wrong their weight is increased to incentivize future models to fix the issue.
diff --git a/AdaGrad.md b/AdaGrad.md
@@ -1,9 +0,0 @@
-# AdaGrad
-
-ML P584
-
-## Notes
-
-**Definition:** Adaptively adjusts learning rate based on historical gradients.
-
-I don't understand this very well.
diff --git a/Adam.md b/Adam.md
@@ -1,11 +0,0 @@
-# Adam (Adaptive moment estimation)
-
-ML P587
-
-## Notes
-
-**Definition:** Adam combines momentum with RMSProp to calculate gradients based on momentum and historical gradients.
-
-This is the best in most cases.
-
-There are variants of adam as well such as AdaMax (generally worse), Nadam (uses [[NAG.md]] idea for calculating in direction of momentum and generally outperforms adam), AdamW (regularized with weight decay).
diff --git a/Adder.md b/Adder.md
diff --git a/AdjacencyMatrix.md b/AdjacencyMatrix.md
@@ -1,9 +0,0 @@
-# Adjacency Matrix
-
-Ch 4
-
-## Notes
-
-**Definition:** An adjacency matrix is a matrix where each column represents a node as do the rows. In each position there is either a true or false denoting whether or not there is an edge between the two nodes.
-
-These matricies are symmetric about the main diagonal and the diagonal is all false as a node may not be connected to itself.
diff --git a/Affinity.md b/Affinity.md
@@ -1,9 +0,0 @@
-# Affinity
-
-ML D5
-
-## Notes
-
-**Definition:** Affinity is any measure of how well an instance fits into a given cluster. 
-
-This is closely related to unsupervised clustering algorithms.
diff --git a/Algorithm.md b/Algorithm.md
@@ -1,13 +0,0 @@
-# Algorithm
-
-Computer Architecture L2
-
-## Notes
-
-**Definition:** A step by step procedure to solve a problem where each step is definite (quantifiable), computable, and finite (ends eventually).
-
-This was described in computer architecture as despite algorithms being implemented in software, there are also algorithms involved in computer architecture. 
-
-## CS 303
-
-**Definition:** An algorithm is a finite list of instructions written in a formal language to perform a specific task which will always terminate after a finite number of instructions have been executed and will always complete the task correctly.
diff --git a/Algorithms.md b/Algorithms.md
@@ -1,157 +0,0 @@
-# Algorithms Index
-
-This is an index for links to notes taken about algorithms. These are CS related algorithms and not related to machine learning (see [[MachineLearning.md]] for that).
-
-## Links
-
-- [MonteCarloMethod](MonteCarloMethod.md)
-- [LasVegasMethod](LasVegasMethod.md)
-- [PerlinNoise](PerlinNoise.md)
-- [FisherYatesShuffle](FisherYatesShuffle.md)
-
-#### CSCI 303 (DS&A)
-
-Ch 0 (algorithms):
-
-- [Algorithm](Algorithm.md)
-- [Task](Task.md)
-- [TimeComplexity](TimeComplexity.md)
-- [CountingPrinciple](CountingPrinciple.md)
-- [MultiValuedFunction](MultiValuedFunction.md)
-- [Collection](Collection.md) 
-- [FunctionNotation](FunctionNotation.md)
-- [OperatorNotation](OperatorNotation.md)
-
-Ch 1 (stacks and queues):
-
-- [AbstractDataType](AbstractDataType.md)
-- [Stack](Stack.md)
-- [Queue](Queue.md)
-
-Ch 2 (Big-O and Asymptotic Complexity):
-
-- [BigONotation](BigONotation.md)
-- [AsymptoticNotation](AsymptoticNotation.md) (include asymptotic complexity class)
-- [BigThetaNotation](BigThetaNotation.md)
-- [Linearithmic](Linearithmic.md)
-
-Ch 3 (state analysis):
-
-- [StateAnalysis](StateAnalysis.md)
-- [StirlingsFormula](StirlingsFormula.md)
-
-Ch 4 (graphs):
-
-- [Graphs](Graphs.md)
-- [Walk](Walk.md)
-- [Path](Path.md)
-- [Cycle](Cycle.md)
-- [Connected](Connected.md)
-- [Tree](Tree.md)
-- [AdjacencyMatrix](AdjacencyMatrix.md) (nxn matrix with true and false for a_i,j)
-- [Digraph](Digraph.md)
-- [Multigraph](Multigraph.md)
-- [Loop](Loop.md) (different than cycle)
-- [Sparse](Sparse.md)
-- [Subgraph](Subgraph.md)
-- [ConnectedComponent](ConnectedComponent.md)
-- [WeightedGraph](WeightedGraph.md)
-- [EmptyGraph](EmptyGraph.md)
-- [Bipartite](Bipartite.md)
-
-Ch 5 (Hashing)
-
-- [Hashing](Hashing.md)
-- [Homogeneous](Homogeneous.md)
-- [HashTable](HashTable.md)
-- [Key](Key.md) 
-- [HashValues](HashValues.md) 
-- [HashFunction](HashFunction.md) 
-- [Folding](Folding.md) 
-- [ArithmeticComputations](ArithmeticComputations.md) 
-- [FiniteField](FiniteField.md) 
-- [Collision](Collision.md)
-- [LinearProbing](LinearProbing.md) 
-- [ProbingFunction](ProbingFunction.md)
-- [QuadraticProbing](QuadraticProbing.md)
-- [LoadFactor](LoadFactor.md)
-- [Chaining](Chaining.md)
-- [BucketAddressing](BucketAddressing.md)
-
-Ch 6 (Information Theory and Data Compression)
-
-- InformationTheory
-- [Codeword](Codeword.md)
-- [BinaryCode](BinaryCode.md)
-- [Entropy](Entropy.md)
-- [InformationContent](InformationContent.md)
-- HuffmanCoding
-- RootedTree (ordered pair (T,r) where r is the root (arbitrary) and T is a graph (tree))
-- Leaf - Exactly one neighbor
-
-Ch 7 (Game Strategy)
-
-- FiniteTwoPlayerGameOfPureStrategy
-- Minimax
-- Negamax
-
-#### Other Stuff To Look At
-
-Operation types (operations done with n inputs)
-
-- [UnaryOperations](UnaryOperations.md)
-- [BinaryOperations](BinaryOperations.md)
-- [NaryOperations](NaryOperations.md)
-- [VariadicOperations](VariadicOperations.md)
-
-#### Intro To Algorithms (MIT)
-
-L1:
-
-- [AsymptoticNotation](AsymptoticNotation.md)
-- [FundamentalOperations](FundamentalOperations.md)
-
-L2:
-
-- [LinkedLists](LinkedLists.md)
-- [DataStructureAugmentation](DataStructureAugmentation.md)
-- [Amortization](Amortization.md)
-
-L4:
-
-- [Hashing](Hashing.md)
-- [OpenAddressing](OpenAddressing.md)
-
-L5 (non-comparative sorting):
-
-- [CountSort](CountSort.md)
-
-L6:
-
-- [BinaryTree](BinaryTree.md)
-
-#### Intro To Algorithms Textbook (CLRS)
-
-2.1
-
-- [InsertionSort](InsertionSort.md)
-- [LoopInvariant](LoopInvariant.md)
-
-2.3
-
-- [Incremental](Incremental.md)
-- [DivideAndConquer](DivideAndConquer.md)
-- [MergeSort](MergeSort.md)
-
-3.2
-
-- [AsymptoticNotation](AsymptoticNotation.md)
-- [Trichotomy](Trichotomy.md)
-- [MonotonicFunction](MonotonicFunction.md)
-
-
-#### Other algorithms adjacent stuff
-
-- [BekensteinBound](BekensteinBound.md)
-- [OracleComputer](OracleComputer.md)
-- [Invariance](Invariance.md)
diff --git a/AmbientSpace.md b/AmbientSpace.md
@@ -1,9 +0,0 @@
-# Ambient Space
-
-Khan U2
-
-## Notes
-
-**Definition:** The ambient space is the space surrounding some object.
-
-When describing a cube the ambient space would be R^3. When discussing a hyperplane with four dimensions the ambient space would be R^5.
diff --git a/Amortization.md b/Amortization.md
@@ -1,7 +0,0 @@
-# Amortization
-
-L2
-
-## Notes
-
-**Definition:** Amortization is the process of averaging out more complex actions across many events even if the smaller events are not actually doing anything related to the complex action.
diff --git a/AngleBetweenVectors.md b/AngleBetweenVectors.md
@@ -1,17 +0,0 @@
-# Angle Between Vectors
-
-Khan
-
-## Notes
-
-**Definition:** The angle between two vectors is the angle between the two vectors when their tails are positioned at the zero vector. 
-
-## Calculation
-
-1. Find magnitude of both vectors ([[DistanceCalculation.md]]) 
-2. Take dot product divided by lengths of vectors to find cosine of the angle (solve)
-	- cos(theta) = (u dot v)/(||u||||v||)
-
-## Intuition 
-
-
diff --git a/Animation.md b/Animation.md
@@ -1,21 +0,0 @@
-# Animation
-
-CG W13 L3
-
-## Notes
-
-**Definition:** Animation is the process of making still images appear as continuous movement.
-
-Unity uses [[Clip.md]] for simple animations. These are pre-defined and repetitive in nature. Think of repeated falling animations and not rigidbody falling animations which would not be predefined. 
-
-These clips can now be stored as a 3d model with parameters that are adjusted instead of images or anything of the sort. 
-
-Unity uses the [[Animation.md]] class as well as [[AnimationController.md]] to control animations.
-
-Unity Class:
-
-There is also a unity class which is a component named animation. This class describes a specific animation. This is called by the [[AnimationController.md]]
-
-Blender Stuff:
-
-See [[KeyframeAnimation.md]] for animating in blender.
diff --git a/AnimationController.md b/AnimationController.md
@@ -1,12 +0,0 @@
-# Animation Controller
-
-CG W13 L3
-
-## Notes
-
-**Definition:** An animation controller is a finite state machine that can be represented as a graph where the verticies are states and the edges are transitions between states. Note that this is a directed graph.
-
-
-See [[Animation.md]] for individual animation class.
-
-This is an observer architecture where it observes things and calls secondary actions namely animation classes.
diff --git a/AnomalyDetection.md b/AnomalyDetection.md
@@ -1,13 +0,0 @@
-# Anomaly Detection
-
-ML CH1
-
-## Notes
-
-**Definition:** Anomaly detection is the task of detecting anomalous samples. 
-
-A common example of this is unusual credit card transactions used to prevent fraud.
-
-This is also commonly used to detect manufacturing faults.
-
-These models are trained on normal data then when abnormal data is fed to it, it marks it as such.
diff --git a/Antisymmetric.md b/Antisymmetric.md
@@ -1,7 +0,0 @@
-# AntiSymmetric
-
-Ch 9.1
-
-## Notes
-
-**Definition:** An antisymmetric relation is one such that if xRy then yRx is false where x != y.
diff --git a/Arccos.md b/Arccos.md
@@ -1,7 +0,0 @@
-# Arccos
-
-SS
-
-## Notes
-
-**Definition:** Arccos is the inverse of cosine. 
diff --git a/Arcsin.md b/Arcsin.md
@@ -1,7 +0,0 @@
-# Arcsin
-
-SS
-
-## Notes
-
-**Definition:** Arcsin is the inverse of sine. 
diff --git a/ArithmeticComputations.md b/ArithmeticComputations.md
@@ -1,7 +0,0 @@
-# Arithmetic Computations
-
-Ch 5
-
-## Notes
-
-**Definition:** Arithmetic computations, with respect to hashing, are computations that use arithmetic operators to go from some key (or portion of a key) to a hash value (or portion).
diff --git a/Armature.md b/Armature.md
@@ -1,10 +0,0 @@
-# Armature
-
-
-## Notes
-
-**Definition:** An armature is a set of bones with parent child relationships. This set can be disjoint where not all bones can be traversed do by moving from parents to children or vice versa.
-
-An armature also has a default pose (generally t pose), which is the state of all it's bones when imported (transform). 
-
-See [[SkeletalAnimation.md]] for more.
diff --git a/Ascii.md b/Ascii.md
@@ -1,7 +0,0 @@
-# Ascii
-
-W2
-
-## Notes
-
-**Definition:** Ascii is another character encoding scheme that uses only 1 byte per character.
diff --git a/Assembly.md b/Assembly.md
@@ -1,129 +0,0 @@
-# Assembly Language
-
-Main Links For Assembly Language
-
----
-
-Assembly Language CS 224
-
-Week 1:
-- [TwosComplement](TwosComplement.md) 
-- [IntegerOverflow](IntegerOverflow.md)
-- [ZeroExtension](ZeroExtension.md)
-- [SignedExtension](SignedExtension.md)
-- [Word](Word.md)
-- [HalfWord](HalfWord.md)
-
-Week 2:
-
-- [Unicode](Unicode.md)
-- [Ascii](Ascii.md) 
-- [String](String.md) (zero terminated, this is the c standard but we will use it in assembly)
-- [Microprocessor](Microprocessor.md) 
-- [Microcontroller](Microcontroller.md) 
-
-ldr - load relative to program counter (ldr R0,=prompt) - 0 means the address if not by value
-
-ldrh - load half word with zero extension
-
-ldrsh - load half word sign extended
-
-ldrsb - load signed byte
-
-bl WriteString - this uses the library to write out the R0 register
-
-bl WriteInt
-
-movs - 
-
-mov - copy - mov R1, R0 (this copies R0 to R1)
-
-'#79' - the pouond sign specifies it is a constant
-
-bl ReadChar - read character input from user
-
-push - pushes to stack (push {R0})
-
-pop - brings from stack to register (pop {R0})
-
-add - two argument or three - two means you add the two and put into the first position, three means add last two and place in first one.
-
-b	L1 - This is the branch command which says to branch to L1.
-
-bge - branch greater or equal - if you have subs then there is another bit somewhere else that will then be used to evaluate bge.
-
-bl Newline
-
-bgt - greater than signed
-
-subs - subtract - subs R1,#1
-
-EQU - when defining a variable with EQU we are stating define
-
-a_len	EQU	(. - a) / 4 ; we are stating . means current byte, a is the start of a and divide by 4 because each element in a is 4 bytes. This would only work when defined right after a value.
-
-ldrsb - load register signed byte
-
-[] - Square brackets load the value associated with the pointer in a register ie. [R0] = value at memory position specified by R0.
-
-cmp
-
-udiv
-
-mls
-
-EA - effective address this is not a command but an idea
-
-ldr R0,[R1,#24] ; EA = R1+24 and R1 is not changed
-
-str R0, [R1, #4]! ; EA = R1 + 4 and R1 = R1 + 4
-
-The ! means the address should be updated in the added register not the sum
-
-addgt
-
-bx
-
-cbz - Compare and branch on zero
-
-cbz	R2, fa2 - This will compare R2 with 0 and then if they are the same branch to fa2
-
-arrb
-
-LR - Link register
-SP - stack frame
-
-LSL
-
-When you have a function that updates registers make sure to save to the stack the original values in those registers then load them at the end of the function. Doing this ensures the method does not mess with the caller.
-
-EX (start):
-
-push	{R1}		; preserve R1
-
-; end
-
-pop		{R1}		; restore register
-
-mov		PC,LR		; return to sender - program counter, link register
-
-
-USE BL to branch with link back
-
-ENDP 				
-
-To preserve multiple things you can do:
-
-push	{R1, R2}
-
-Corresponding pop:
-
-pop		{R1, R2}
-
-consider incrementing count in insert
-
----
-
-Processing GPIO interrupts
-
-
diff --git a/Asset.md b/Asset.md
@@ -1,9 +0,0 @@
-# Asset
-
-CS 331 W12 L3
-
-## Notes
-
-**Definition:** Assets are all resources in untiy. 
-
-
diff --git a/Associative.md b/Associative.md
@@ -1,11 +0,0 @@
-# Associative
-
-MML Ch 2.2
-
-## Notes
-
-**Definition:** Associativity of an operation means that regardless of the location of parenthesis the resulting computation is still the same assuming the order of values is also the same.
-
-Example:
-
-a + (b + c) = (a + b) + c
diff --git a/AstronomicalUnit.md b/AstronomicalUnit.md
@@ -1,14 +0,0 @@
-# Astronomical Unit (AU)
-
-CM L1
-
-## Notes
-
-**Definition:** An astronomical unit is a measure of distance defined as the mean distance between the earth and the sun.
-
-Distance in Meters:
-149,597,870,700m
-
-## Usage
-
-This unit of measure is often used in relation to solar systems when km and light seconds are not very applicable given their respective low/large values being hard to interprit.
diff --git a/AsymptoticNotation.md b/AsymptoticNotation.md
@@ -1,39 +0,0 @@
-# Asymptotic Notation
-
-L1 MIT
-
-## Notes
-
-**Definition:** Asymptotic notation describes the running time of an algorithm.
-
-#### Types of complexity notation
-
-There are three different notations for this big O, beg Theta, and big Omega. 
-
-1. Big Theta
-	- Big Theta notation creates a tight bound about asymptotic behavior. This is a precise growth rate bound between O(f(n)) and Omega(f(n)).
-
-2. Big Omega
-	- Big Omega notation is used to describe the lower bound (best case) complexity. This value describes a functions complexity as at least whatever is specified. As such a function with n^3 growth can be stated as Omega(n^c) where n <= 3. Subsequently, it is also true that Omega(n^1) can describe this function. 
-
-3. Big O
-	- Big O notation is used to describe the upper bound (worst case) complexity. An interesting note is that the function with growth 7n^3 + 2n^2 can have O(n^3) time complexity, but more accurately we could say O(n^c) where c >= 3. This is because we are stating it does not grow faster than n^c. Such is the case then that O(n^10) is also true as the complexity does not grow faster than that.
-
-Note: When describing loose bounds (2n = o(n^2)) we use lowercase letters such as little o. This implies that we are describing an upper bound that is guaranteed to be larger than the growth rate that is not tight to the upper bound of the algorithm like how big O would be.
-
-
-#### Common complexities
-
-O(1) - Constant
-
-O(logn) - Logarithmic
-
-O(n) - Linear
-
-O(nlogn) - Log Linear
-
-O(n^2) - Quadratic time
-
-O(n^c) - Polymnomial time (arbitrary constant c)
-
-2^O(n) - Exponential Time
diff --git a/Autoencoder.md b/Autoencoder.md
@@ -1,13 +0,0 @@
-# Autoencoder 
-
-ML General
-
-## Notes
-
-**Definition:** An autoencoder is an unsupervised neural network that takes inputs, compresses them into a smaller representation while trying to maintain as much information as possible, and then reconstructs the compressed representation into a new full representation.
-
-The idea of an autoencoder is for the model to learn the best way to extract features out of a large input (many features) so it can then be passed to another model that will require less features and subsequently be faster to train and use. 
-
-Autoencoder are made of two part they have an encoder and a decoder. The encoder takes in an input with all of the features and then outputs a compressed representation of it where the output has less features. The decoder then takes the compressed representation as the input and tries to create the original input to the encoder. The error (difference between output and actual input) is what we are trying to minimize. 
-
-Autoencoders are often used for unsupervised pretraing by training the autoencoder and then using the lower layers of it as the lower layers of a neural network. This uses the encoders compression as the input for the neural network.
diff --git a/BCD.md b/BCD.md
@@ -1,16 +0,0 @@
-
-CA L3
-
-## Notes
-
-**Definition:** Binary coded decimal (BCD) is the process of encoding a decimal where each digit is a fixed number of bits.
-
-Ex. 
-
-Before: 10:37:49
-
-After: 0001 0000 : 0011 0111 : 0100 1001
-
-As you can see above, each digit is encode in a nibble.
-
-
diff --git a/BIOL115.md b/BIOL115.md
@@ -1,15 +0,0 @@
-# Biology 115 - Human Biology
-
-Summer 24
-
-## Main Links
-
-**Definition:** Biology is the study of living organisms and the environments they live in. 
-
-Shared characteristics include:
-1. Organization
-2. Acquisition of materials and energy
-3. Homeostatic 
-4. Respond to stimuli
-5. Reproduce and have potential for growth
-6. Have an evolutionary history
diff --git a/Backpropagation.md b/Backpropagation.md
@@ -1,17 +0,0 @@
-# Backpropogation
-
-ML D6
-
-## Notes
-
-**Definition:** Backpropagation is the combination of reverse-mode autodiff and gradient descent to iteratively improve models based on expected outputs by given inputs by following the gradient for each [[Weight.md]] and [[Bias.md]].
-
-When using backpropogation we use many mini-batches. Generally we go through the entire dataset to train multiple times and these passes are called epochs. When using mini-batches we first find the values from the input layer for each input, then we go to the second layer, and so on until reaching the output layer. This is the forward pass stage. An important note is that all intermediate values must be preserved to ensure we can do the backward pass.
-
-Once the forward pass is completed we then compute a loss function to find the output error.
-
-Next, we compute how much each bias and connection contributed to this error moving are way backwards from the output layer to the input layer. This is done using the chain rule. 
-
-Lastly, using these error gradients, we do a gradient descent step to tweak the connection weights and biases.
-
-When doing backpropogration we should replace the MLP's step function with a function that does not have a derivative of 0 in all places (ReLU, Sigmoid, etc) to ensure gradient descent steps can be made. This is referred to as the activation function.
diff --git a/Bagging.md b/Bagging.md
@@ -1,9 +0,0 @@
-# Bagging
-
-ML D5
-
-## Notes
-
-**Definition:** Bagging is the process of training the same model multiple times with a different subset of the data. Bagging is different than pasting as bagging does not take samples that are selected as part of the random sample for training out of the options to add to the random sample. This means one model (predictor) can be trained with multiple instances of the same sample.
-
-One reason bagging and pasting are good is that they both allow for parallel processing because multiple models do predictions concurrently. The same is also true for model training.
diff --git a/Baking.md b/Baking.md
@@ -1,16 +0,0 @@
-# Baking
-
-CS 331 W11 Lecture 2
-
-## Notes
-
-**Definition:**  The process of precomputing. Another term for this is statically computed (not dynamically computed ie realtime).
-
-There are two different types of precomputing we can implement:
-
-1. Lossless 
-
-2. Lossy
-
-With lossless precomputing the quality is equal but performance is better while with lossy precomputing we have better performance at the cost of worse quality. An example of improved performance without loss in quality is precomputing bounding planes instead of calculating them based on points each time collision detection is called. An example of lossy precomputing would be to bake in shadows so they don't need to be recomputed, but this could cause issues with angles and changes over time in light. 
-
diff --git a/Bandits.md b/Bandits.md
@@ -1,11 +0,0 @@
-# Bandits
-
-L1
-
-## Notes
-
-**Definition:** Bandits are a class of problems in RL where an agent repeatedly chooses from a set of actions which give a reward drawn from an unknown probability distribution.
-
-Basically, there are a set of actions, you do one, you have a reward... that's all
-
-This is an MDP with only one state.
diff --git a/Bandwidth.md b/Bandwidth.md
@@ -1,9 +0,0 @@
-# Bandwidth
-
-Stats D3
-
-## Notes
-
-**Definition:** Bandwidth is a hyperparameter used in smoothing techniques that describes the width of kernels.
-
-With regard to kdes, a higher value means the graph will be more smooth while the inverse is true as well. 
diff --git a/BarrierSynchronization.md b/BarrierSynchronization.md
@@ -1,17 +0,0 @@
-
-Computer Architecture L2
-
-## Notes
-
-**Definition:** This is a way to block all execution until all inputs are ready. This can be thought of as thread syncing and is closely related to [[DataFlow.md]] execution.
-
-
-in1  in2  in3
-
-BLOCKERBLOCKER
-
-out1 out2 out3
-
-in this image, all in's need to be ready before out's are assigned. 
-
-See [[BulkSynchronousProcessing.md]] for more of the same information. Bulk synchronous processing is the idea of processing lots of things in parallel before moving on.  
diff --git a/BasicVariables.md b/BasicVariables.md
@@ -1,7 +0,0 @@
-# Basic Variables
-
-Ch 2.2
-
-## Notes
-
-**Definition:** Basic variables of a set of linear equations (or of a matrix) are variables that have a definite value which can be noted from the fact that they are the only 1 in the column when in RREF.
diff --git a/BasisOfSubspace.md b/BasisOfSubspace.md
@@ -1,7 +0,0 @@
-# Basis of a Subspace
-
-Khan
-
-## Notes
-
-**Definition:** The basis of a subspace is list of vectors V := (v_1, v_2, ..., v_m) such that V spans the subspace and is linearly independent.
diff --git a/BatchNormalization.md b/BatchNormalization.md
@@ -1,9 +0,0 @@
-# Batch Normalization
-
-ML P569
-
-## Notes
-
-**Definition:** Batch normalization is the process of adding layers to a neural network that perform normalization upon inputs and output the normalized values.
-
-This helps with unstable gradient issues and removes the need to normalize inputs for the network. On the flip side, these computations are bad for TPUs and are generally slow. They also don't work with RNNs.
diff --git a/BayesTheroem.md b/BayesTheroem.md
@@ -1,15 +0,0 @@
-# Bayes Threoem 
-
-L2
-
-## Notes
-
-**Definition:** Bayes theroem is $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$
-
-This can be derived by the conditional probability theroem.
-
-Bayes theroem allows us to update the probability of some outcome given known information.
-
-## Intuition
-
-The probability that I have cancer (A) given that I have some symptoms of cancer (B) is equal to the probability that I have symptoms of cancer (B) given that I have cancer (A) multiplied by the overall probability that I have cancer (A) divided by the probability that I have symptoms (B).
diff --git a/BayesianInference.md b/BayesianInference.md
@@ -1,9 +0,0 @@
-# Bayesian Inference
-
-Stats D5
-
-## Notes
-
-**Definition:** Bayesian inference is the principal that p(something) can often be described based on prior inferences that may make p(something) more or less likely thus factoring them into the probability.
-
-This is basically using state to update probability values.
diff --git a/BekensteinBound.md b/BekensteinBound.md
@@ -1,9 +0,0 @@
-# Bekenstein Bound
-
-SS
-
-## Notes
-
-**Definition:** The Bekenstein bound gives the most amount of energy that can be contained in a sphere prior to it becoming a blackhole.
-
-This has implications for computation as there is a theoretical cap for which any computation device that exceeds this would instantly become a blackhole. 
diff --git a/BellmanEquation.md b/BellmanEquation.md
@@ -1,9 +0,0 @@
-# Bellman Equation
-
-L2
-
-## Notes
-
-**Definition:** The Bellman equation is an equation that states the value of the optimal choice right now is the value of the next choice + the value of the current choice.
-
-This is intuitive and simple to understand, but it is the basis for our ability to do dynamic programming because without it there is no optimal substructure.
diff --git a/BernoulliProcess.md b/BernoulliProcess.md
@@ -1,23 +0,0 @@
-# Bernoulli Process
-
-Prob L13
-
-## Notes
-
-**Definition:** A Bernoulli process is a sequence of binary trials (random variables).
-
-As such, sample space are all possible sets of outcomes confined to a certain number of trials.
-
-Given that a bernoulli process must be memoryless, we can then derive that each trial must have equal probability otherwise each trial would not be independent.
-
-Ie.
-
-Sample space of 9 coin flips:
-
-000000000
-000000001
-.........
-.........
-.........
-111111110
-111111111
diff --git a/BernoulliRandomVariable.md b/BernoulliRandomVariable.md
@@ -1,9 +0,0 @@
-# Bernoulli Random Variable 
-
-Prob L8
-
-## Notes
-
-**Definition:** A bernoulli random variable is a random variable that has a bernoulli distribution where the outcome is binary. 
-
-In a bernoulli distribution the probability of any given event x is defined as p and the probability of not x is defined as 1-p. 
diff --git a/Bias.md b/Bias.md
@@ -1,18 +0,0 @@
-# Bias
-
-ML D5
-
-## Notes
-
-### Stats
-
-**Definition:** Bias is a generalization error caused by incorrect assumptions such as assuming data is linear when it is not.
-
-High bias models are likely to underfit training data.
-
-See also [[Variance.md]]
-
-
-### ANNs
-
-**Definition:** Biases in ANNs are constants used as additional inputs for each perceptron (neuron). This can be thought of like y-intercepts for linear equations.
diff --git a/Biconditional.md b/Biconditional.md
@@ -1,18 +0,0 @@
-# Biconditional (iff)
-
-1.1.2
-
-## Notes
-
-**Definition:** The biconditional is the [[Connectives.md]] that states the antecedent and consequent have the same truth values.
-
-$p \iff q$ this can be stated as p iff q, if and only if p then q, or some other way.
-
-Basically, this is only true when both propositions share the same truth value.
-
-| p | q | $p \iff q$ |
-|---|---|------------|
-| T | T | T          |
-| T | F | F          |
-| F | T | F          |
-| F | F | T          |
diff --git a/BigONotation.md b/BigONotation.md
@@ -1,9 +0,0 @@
-# Big O Notation
-
-Ch 2
-
-## Notes
-
-**Definition:** Big O Notation is a system agnostic way to describe worst case runtime for an algorithm. With Big O Notation we formally state f(x) = O(g(x)) for some c and N such that f(n) <= c(g(x)) for all x >= N. 
-
-Basically, there must be some constant multiple and some starting point such that the growth rate of the function f(x) does not ever surpass g(x). Note that the equality is a bit contentious as O(g(x)) describes a family of functions with coefficients c.
diff --git a/BigThetaNotation.md b/BigThetaNotation.md
@@ -1,7 +0,0 @@
-# Big Theta Notation
-
-CS 303 Ch 2
-
-## Notes
-
-**Definition:** We use big theta notation to state that an algorithm has exactly the same asymptotic complexity as some other algorithm. This means big theta of f is equivalent to big theta of g where each of them will (almost always) have a unique value for c (constant multiplier) and a unique value for N (where N <=x).
diff --git a/Bijective.md b/Bijective.md
@@ -1,11 +0,0 @@
-# Bijective 
-
-L2
-
-## Notes
-
-**Definition:** For a function to be bijective it must be both [[Surjective.md]] and [[Injective.md]].
-
-This means that each value in the domain maps to a unique value in the codomain (Injective) and each value in the codomain is mapped to at least once (Surjective).
-
-Note: Another term for a bijection is one-to-one correspondence because injection is sometimes called one-to-one and a surjection is sometimes called onto. 
diff --git a/BijectiveProof.md b/BijectiveProof.md
@@ -1,18 +0,0 @@
-# Bijective Proof
-
-Ch 6.3
-
-## Notes
-
-**Definition:** A bijective proof is a proof where we prove the compared sets can be represented as a bijective function and thus have the same cardinality.
-
-
-Example:
-
-Prove $\binom{n}{k} = \binom{n}{n-k}$
-
-Proof:
-
-$\binom{n}{k}$ describes all combinations of length k from a set of length n. Let's now defined a function upon these combinations. This function is f : X -> Y where X is the set of all combinations of length k made from the set with cardinality n. Let's also define Y as the complement of the input x with respect to the original set of length n. We know this function is a bijection because all combinations have a unique complement and all complements are mapped to, because of how we defined our function. $\blacksquare$
-
-I could improve upon this proof by stating Y is the set of all combinations of n with length n - k.
diff --git a/BinaryCode.md b/BinaryCode.md
@@ -1,35 +0,0 @@
-# Binary Code
-
-Ch 6
-
-## Notes
-
-**Definition:** A binary code for S is a function c from S -> {0,1} * .
-
-Basically, this is the function to encode elements of S to binary.
-
-### Proper
-
-A proper binary code is a binary code such that there are not any possible combinations of codes that can be confused with each other.
-
-Example of improper:
-
-S = {'A', 'B', 'C'}
-
-c : S -> {0,1}\*
-
-c('A') = 0
-
-c('B') = 01
-
-c('C') = 10
-
-The problem here is that we don't know if 010 is BA or CA. This is because the prefix of B is A.
-
-It is sufficient to ensure all codewords are not prefixes for other codewords.
-
-#### Prefix Property
-
-**Definition:** c : S -> {0,1}\* be a binary code. We say c has the prefix property if no codeword is a prefix of any other codeword.
-
-$\forall x,y \in S, x \neq y$ then there are no strings r such that $c(x) = c(y) + r$
diff --git a/BinaryOperations.md b/BinaryOperations.md
@@ -1,9 +0,0 @@
-# Binary Operations
-
-SS
-
-## Notes
-
-**Definition:** Binary operations are operations that take two inputs.
-
-Some examples include assignment (left,right side), addition, subtraction
diff --git a/BinaryTree.md b/BinaryTree.md
@@ -1,19 +0,0 @@
-# Binary Tree
-
-CS202 L14
-
-## Notes
-
-**Definition:** For any node n, all elements in the left subtree are less than the current node and everything in the right subtree is greater than the current node. 
-
-For a generic binary tree, there is no necessitation that the left a right trees are in any way balanced.
-
-A balanced binary tree has search time complexity of logn. 
-
-## Datastructure Specifics
-
-Depth - Number of edges from the current node to the root of the tree
-
-Height - Number of edges for the longest downward path
-
-Subtree - A tree that is defined as subtree(a) where each node below a is included in the subtree
diff --git a/Binomial.md b/Binomial.md
@@ -1,15 +0,0 @@
-# Binomial
-
-Ch 1.3
-
-## Notes
-
-**Definition:** A binomial is the combination of two values in the form of (x + y).
-
-Examples:
-
-x+y
-
-(x+y)^10
-
-z+n
diff --git a/BinomialCoefficient.md b/BinomialCoefficient.md
@@ -1,33 +0,0 @@
-# Binomial Coefficient
-
-L4
-
-## Notes
-
-**Definition:** A binomial coefficient is represented by two numbers and has a singular evaluation. The evaluation describes the number of unique subsets of the length denoted by the bottom value that can be created given a set of the length denoted by the top value.
-
-The reason it is called the binomial coefficient is because it can be used in the expansion of binomials (ie. (x+y)^5). To use it in this case we multiply the applicable coefficient with the number of ways to select that number of a coefficient. This idea is also described as the binomial theorem.
-
-### Formula
-
-(n) = n! / ((r!(n-r)!)
-(r)
-
-### Example
-
-(8) = 8! / ((3!(8-3)!))  = 40320 / (6 x 120) = 40320/720 = 56
-(3)
-
-8 choose 3 is 56
-
-### Intuition
-
-The top of the function is all permutations of the list. The problem with this is that it includes rearrangements which we don't care for and because it is limited to the length of the entire set. 
-
-As such, we divide this by r! to account for the arrangements of the r items. The second part,   (n-r!) accounts for arrangements where we are not choosing r items.
-
-All together, we find the permutations of sets length n then divide this by r! to find the number of distinct sets not arrangements and then we divide by (n-r)! to get rid of sets that don't have r items.
-
-### Stats
-
-In stats we often denote this using either the vertical denotation or the denotation $_nC_r$ where n is the length of the set and r is the size of each subset.
diff --git a/BinomialDistribution.md b/BinomialDistribution.md
@@ -1,11 +0,0 @@
-# Binomial Distribution
-
-Stats D1
-
-## Notes
-
-**Definition:** A binomial distribution is a distribution such that each point is the probability of some true or false condition.
-
-This can be thought of as a medical experiment. The x-axis would be some marker and the y axis would be the probability of curing some disease... As an example.
-
-Each repition is called a trial (single example).
diff --git a/Bipartite.md b/Bipartite.md
@@ -1,11 +0,0 @@
-# Bipartite
-
-Ch 4
-
-## Notes
-
-**Definition:** A bipartite graph is a graph that can be divided into two sets where every edge connects a vertex in one set to the other set, but never the same set.
-
-Think about a graph with red and blue where blue can only connect to red and vice versa.
-
-The generalization of this are multipartite graphs where we have integer k that defines the number of sets instead of simply 2.
diff --git a/BitSteering.md b/BitSteering.md
@@ -1,11 +0,0 @@
-# Bit Steering
-
-CA L3
-
-## Notes
-
-**Definition:** This is a bit in an instruction that determines how later bits are interpreted. 
-
-A good example of this is an [[Opcode.md]]
-
-There are also other examples including Alpha's ([[ISA.md]]) ADD instruction which allows for permutations of the ADD instruction based on a bit passed to it as part of the instruction.
diff --git a/Blender.md b/Blender.md
@@ -1,23 +0,0 @@
-# Blender
-
-CS331 W12 L3
-
-## Notes
-
-The default file format is FBX (Filmbox) which can be imported into [[Unity.md]].
-
-## Links
-
-[[Mesh.md]]
-[[Pole.md]]
-[[UVMaps.md]]
-[[Animation.md]]
-[[KeyframeAnimation.md]]
-[[SkeletalAnimation.md]]
-[[BlenderShortcuts.md]]
-
-## To Do
-
-[[Seam.md]]
-
-
diff --git a/BlenderShortcuts.md b/BlenderShortcuts.md
@@ -1,39 +0,0 @@
-# Blender Shortcuts
-
-Shortcuts from lectures
-
-## Notes
-
-"Z" - Switch between solid and wireframe (useful to select everything from a mesh from all sides)
-
-"1" - Vertex Mode
-
-"2" - Edge Mode
-
-"3" - Face Mode
-
-"a" - Toggle select all when using the select box (top of left pane overlay)
-
-Transform operations:
-
-"g" - move selected items (translate) - pair this with x,y, and z to just move on the specified axis
-
-"r" - rotate selected parts - this also can be used with an axis using x,y, and z keys as done with translations.
-
-"s" - scale selected parts - see above for sepcifying axis movements
-
-Edit Mode:
-
-"e" - extrude the selected parts - you can still specify the axis to extrude by.
-
-"ctrl+R" - loop cut - create new verticies and edges around the circumference of the selected area. We can then subdivide further using the mouse wheel
-
-Forcing Bilateral Symmetry:
-
-In edit mode, select the object. In the right control panel select modifiers (blue wrench) and then add mirror over the expected axis. 
-
-If you select "clipping" then it won't create the interior face. To do this drag them apart, click the button, and drag them back together.
-
-This does not work for armatures. Instead, you need to select the bones to mirror, right click, selecte autoname, right click again, select symmeterize. This will mirror over x axis. If you don't autoname it won't work, also if you don't have it mirroring over the x-axis, it will also not work. As such, to resolve the rotation issue, click "r" and then rotate to ensure it should be mirrored over the x-axis
-
-
diff --git a/Boosting.md b/Boosting.md
@@ -1,15 +0,0 @@
-# Boosting
-
-ML D5
-
-## Notes
-
-**Definition:** Boosting is the process of combining several weak learners into one strong learner.
-
-The idea of this is to sequentially train predictors to correct the output of prior models.
-
-Adaboost is a popular boosting algorithm which is short for adaptive boosting.
-
-There is also gradientboosting which is popular as well.
-
-The main difference between boosting and most voting classification implementations is that it is purely sequential. It also uses weaker learners like shallow decision trees to make predictions. Additionally, where the name comes from, models boost the importance of training examples to focus the model on mproving misclassified data.
diff --git a/Boxplots.md b/Boxplots.md
@@ -1,9 +0,0 @@
-# Boxplots
-
-Stats D4
-
-## Notes
-
-**Definition:** A boxplot is a plot that shows the distribution of quartiles.
-
-These plots show the IQR (interquartile range - Q2 and Q3) filled in and then lines out to the Q1 and Q4 points. These also have dots for some outliers. This is also known as a box and whisker plot.
diff --git a/BreadthFirstSearch.md b/BreadthFirstSearch.md
@@ -1,9 +0,0 @@
-# BFS
-
-CS 202 L14
-
-## Notes
-
-**Definition:** Search algorithm that moves its way outward from the root node. This is different than [[DepthFirstSearch.md]] as it does not go all the way down and then search but instead moves away from the root.
-
-This uses a [[Queue.md]] to search.
diff --git a/BucketAddressing.md b/BucketAddressing.md
@@ -1,7 +0,0 @@
-# Bucket Addressing
-
-Ch 5
-
-## Notes
-
-**Definition:** Bucket addressing is the process of using a finitely sized collection to store objects that collide.
diff --git a/BulkSynchronousProcessing.md b/BulkSynchronousProcessing.md
@@ -1,11 +0,0 @@
-# Bulk Synchronous Processing
-
-CA L2
-
-## Notes
-
-**Definition:** Completing parallel processing and then using [[BarrierSynchronization.md]] to join together threads of execution. 
-
-This is called bulk because it can be done all concurrently while also having synchronization in the form of a thread join. 
-
-Introduced by Leslie Valiant
diff --git a/CART.md b/CART.md
@@ -1,13 +0,0 @@
-# CART - Classification and Regression Tree Algorithm
-
-ML D4
-
-## Notes
-
-**Definition:** The CART algorithm is used to train decision trees and works by splitting a training set into two parts using a single feature k where k is the feature that produces the purest subsets weighted by size. This is then repeated at each step (greedy) until reaching either a max depth, or until reaching some depth whereby it can not find a split that will reduce impurity.
-
-Note that this algorithm is greedy so there may be better lines that could be drawn if it took a suboptimal line at a given point in time, but that would increase the computing cost drastically.
-
-There are two common cost functions that fall under CART being reducing entropy and gini impurity. Gini impurity is default (trying to minimize this) while entropy also known as information gain can be used, but it is slower as it uses logarithms.
-
-This can also be used with MSE instead of gini or entropy to do regression. We basically just want to minimize MSE at each step.
diff --git a/CNN.md b/CNN.md
@@ -1,53 +0,0 @@
-# Convolutional Neural Network (CNN)
-
-ML SS
-
-## Notes
-
-**Definition:** A convolutional neural network is a neural network that has convolutional layers that perform filtering functions upon the input data.
-
-A convolution is the process of moving a filter across some data and calculating the current values based on the surrounding values multiplied by the values in the filter and then summing them for the final result. 
-
-CNNs are good for image detection because they retain information about pixels and what surrounds them. This allows them to pick up edges, curves, and higher level concepts. 
-
-When using CNNs it is a good idea to start at around 32 filters (or higher) and increase (often double) the number of filters as the layers progress. 
-
-Additionally, don't forget to add [[MaxPooling.md]] to ensure features are compressed and complexity of the model is minimized.
-
-### Typical Form
-
-Early on have few filters and later more. This is to capture general stuff early and more complex stuff later.
-
-Use 'same' dimension instead of 'valid' when trying to maintain dimensions (small images) and normally use it early on.
-
-Double filters after each pooling layer (generally)
-
-General Form:
-
-Conv
-Relu
-Conv
-Relu 
-Pooling
-Conv
-Relu
-Conv
-Relu
-Pooling
-...
-...
-...
-Flatten
-Dense
-(DROPOUT???)
-Dense
-(DROPOUT???)
-Dense (output)
-
-You can have a few more convs stacked together before pooling, but the idea is a few convs with relus right after then pooling. At the end there should then be a few dense layers.
-
-It is a good idea to have a larger kernel early on to decrease dimensionallity early on. Then later use smaller kernels that require less computations and have better fine grained accuracy.
-
-NOTE:
-
-When using keras you should specify relu in line with the conv for the activation.
diff --git a/CPP.md b/CPP.md
@@ -1,14 +0,0 @@
-# C++
-
-This index tracks c++ related concepts.
-
-## Links
-
-### Memory Management
-
-- [UniquePointers](UniquePointers.md)
-- [SharedPointers](SharedPointers.md)
-
-### STL
-
-- [Vector](Vector.md)
diff --git a/CS202.md b/CS202.md
@@ -1,21 +0,0 @@
-# CS 202
-
-This is the index for my cs 202 notes. 
-
-## Main Links
-
-- [LinkedLists](LinkedLists.md) 
-- [MemoryManagement](MemoryManagement.md) 
-- [AbstractDataType](AbstractDataType.md) 
-- [Stack](Stack.md) 
-- [Queue](Queue.md) 
-- [Tree](Tree.md) 
-- [BinaryTree](BinaryTree.md) 
-- [DepthFirstSearch](DepthFirstSearch.md) 
-- [BreadthFirstSearch](BreadthFirstSearch.md) 
-- [Rvalue](Rvalue.md) 
-- [Lvalue](Lvalue.md) 
-- [SentinelValue](SentinelValue.md) 
-- [CanaryValue](CanaryValue.md) 
-- [TwosComplement](TwosComplement.md) 
-- [OnesComplement](OnesComplement.md) 
diff --git a/CS331.md b/CS331.md
@@ -1,23 +0,0 @@
-# CS 331
-
-This is the index for my CS 331 notes. 
-
-## Main Links
-
-- [Unity](Unity.md)
-- [Blender](Blender.md)
-- [MathConceptsCS331](MathConceptsCS331.md)
-- [IPD](IPD.md)
-
-FINAL EXAM:
-
-Learn about seams and how to mark good seams
-	- The number of edges in the resulting layout should be equal to the number of seams divided by 2. 
-
-what order should armature's and meshes be made (armature first vs mesh first)
-
-Mesh first then armature
-
-Unwrapping is marking seams and going from 3d to 2d. This gives you the layout of the 2d mesh.
-
-UV Mapping takes in a 3d mesh and returns an image describing how to color it in. Unwrapping is part of the process of UV mapping. 
diff --git a/Cache.md b/Cache.md
@@ -1,5 +0,0 @@
-# Cache
-
-## Notes
-
-
diff --git a/CaesarCipher.md b/CaesarCipher.md
@@ -1,7 +0,0 @@
-# Caesar Cipher
-
-U 2.4
-
-## Notes
-
-**Definition:** A Caesar Cipher is monoalphabetic substitution whereby we encode characters as numbers shift the numbers by a constant amount and then decode them.
diff --git a/Calculus.md b/Calculus.md
@@ -1,53 +0,0 @@
-# Calculus (Links)
-
-
-## Main Links
-
-Calc 2 (Leonard):
-
-L1:
-	- [[NaturalLog.md]]
-	- [[ProductRule.md]]
-	- [[ChainRule.md]]
-	- [[LogarithmicDifferentiation.md]]
-
-L2:
-	- [[InverseFunction.md]]
-	- [[Injective.md]]
-	- [[Surjective.md]]
-	- [[Bijective.md]]
-
-Khan Calc 2:
-
-Unit 1:
-	- [[FundamentalTheroemofCalculus.md]]
-
-Unit 2:
-	- [usubstitution](usubstitution.md)
-	- ExponentialRule
-
-Calculus Early Transcendentals JS:
-
-Section 2.8:
-	- [[Jerk.md]]
-
-## Known Integrals
-
-
-Trig Integrations:
-
----
-
-sin(x) -> -cos(x) + c
-
-cos(x) -> sin(x) + c
-
-sec^2(x) -> tan(x) + c
-
-sec(x)tan(x) -> sec(x) + c 
-
-csc(x)cot(x) -> -csc(x) + c
-
-csc^2(x) -> -cot(x) + c
-
----
diff --git a/CanaryValue.md b/CanaryValue.md
@@ -1,9 +0,0 @@
-# Canary Value
-
-CS202 SelfStudy
-
-## Notes
-
-**Definition:** A canary value is used to detect buffer overflows by placing dummy data to be validated at some future time to ensure buffer overflows do not occur.
-
-When doing this, we create dummy data in a sequential piece of memory and then at some future time validate the data stored there to ensure buffer overflows are not occuring as they would change this data.
diff --git a/CartesianProduct.md b/CartesianProduct.md
@@ -1,11 +0,0 @@
-# Cartesian Product
-
-Throughout textbook
-
-## Notes
-
-**Definition:** The Cartesian Product of two sets is the set of all ordered pairs a,b where a is contianed in A and b is contained in B. 
-
-This set has a size of |A| * |B|.
-
-Cartesian products or cartesian sets, are denoted using x as in A x B. This is also how we describe the coordinate planes denoted as R^2 as it is the cartesian set of all real numbers and all real numbers. 
diff --git a/Cases.md b/Cases.md
@@ -1,7 +0,0 @@
-# Proof by Cases
-
-U 1.8.1
-
-## Notes
-
-**Definition:** Proof by cases is a form of proof whereby we show every specific type of case is true.
diff --git a/CategoricalCrossEntropy.md b/CategoricalCrossEntropy.md
@@ -1,13 +0,0 @@
-# Categorical Cross Entropy
-
-ML D6
-
-## Notes
-
-**Definition:** Categorical cross entropy is a loss calculation used for classification algorithms.
-
-Categorical cross entropy is calculated by summing the log of y_i log(p_i) and multiplying by -1 where y_i is the expected classification (1 is true 0 false) and p_i is the probability output of the model.
-
-In essence, this is the negative sum of the logs of all probability outputs where the input should be a part of the class. All other classes are ignored so if another class has a .8 probability output it is multiplied by 0 thus not having an effect on the categorical cross entropy of the model.
-
-Cross entropy is the idea that we want to have the difference between the true probability and the estimated probability. This can be stated more complexly, but in the end it always uses logs.
diff --git a/Ceiling.md b/Ceiling.md
@@ -1,11 +0,0 @@
-# Ceiling 
-
-U2.3.4
-
-## Notes
-
-**Definition:** The ceiling function specifies to round up the input to the nearest integer. 
-
-Remember to still round to the higher number for negatives.
-
-$\lceil 10.1 \rceil = 11$
diff --git a/CentralLimitTheroem.md b/CentralLimitTheroem.md
@@ -1,7 +0,0 @@
-# Central Limit Theroem (CLT)
-
-L20
-
-## Notes
-
-**Definition:** The CLT states that as the number of trials increases distributions tend towards a normal distribution. 
diff --git a/ChainRule.md b/ChainRule.md
@@ -1,7 +0,0 @@
-# Chain Rule
-
-Leonard
-
-## Notes
-
-**Definition:** The chain rule is a derivation rule used when we have a function within another function. The rule states $\frac{d}{dx} (g(f(x))) = g'(f(x)) \cdot f'(x)$.
diff --git a/Chaining.md b/Chaining.md
@@ -1,7 +0,0 @@
-# Chaining 
-
-Ch 5
-
-## Notes
-
-**Definition:** Chaining is the process of using a linked list to resolve collisions that result from duplicate hashcodes.
diff --git a/ChangeOfBasis.md b/ChangeOfBasis.md
@@ -1,29 +0,0 @@
-# Change of Basis
-
-Khan U3
-
-## Notes
-
-**Definition:** Change of basis in linear algebra is the process of assuming the basis vectors to be some arbitrary linearly independent vectors.
-
-Example:
-
-B = { [1] [2]
-	  [2] [1]}
-
-a = 3B_1 + 2B_2
-
-[a]\_B = [3]
-		 [2]
-
-While we have stated a to be [3 2] we are assigning it with basis' of B so in the standard coordinate system a = [8 7].
-
-## Matrix Representation
-
-The matrix representation of a change of basis is simply a matrix that we multiply all matricies under the basis by to find the true coordinates using the new basis'. 
-
-The matrix representation of a change of basis is always invertible.
-
-## L.T.s
-
-Linear transformations are specified under the basis they are being applied to and do not apply under different basis'.
diff --git a/CharacteristicEquation.md b/CharacteristicEquation.md
@@ -1,13 +0,0 @@
-# Characteristic Equation
-
-Ch 8.2
-
-## Notes
-
-**Definition:** A characteristic equation is an equation for a linear homogeneous recurrence relation that uses a_n = r^n to substitute into the equation.
-
-Original:
-$a_n = c_1a_{n-1}+c_2a_{n-2}+...+c_ka_{n-k}$
-
-Characteristic Equation:
-$r^k-c_1r^{k-1}-c_2r^{k-2}-...-c_k=0$
diff --git a/CharacteristicRoots.md b/CharacteristicRoots.md
@@ -1,7 +0,0 @@
-# Characteristic Roots
-
-Ch 8.2
-
-## Notes
-
-**Definition:** A characteristic root in discrete math are values that satisfy a [CharacteristicEquation](CharacteristicEquation.md).
diff --git a/CircuitTechnology.md b/CircuitTechnology.md
@@ -1,7 +0,0 @@
-# Circuit Technology
-
-Discussion of materials, gates, and things of that sort. 
-
-## Notes
-
-
diff --git a/CircularDoublyLinkedList.md b/CircularDoublyLinkedList.md
@@ -1,8 +0,0 @@
-
-CS202 L14
-
-## Notes
-
-**Definition:** This is a doubly linked list where the last pointer points to the first and the first pointer of the first element points to the last.
-
-Can be used wherever [[CircularLinkedList.md]]s are used and are better when bi-directional movement is required. I am having trouble thinking of when this would ever be useful.
diff --git a/CircularLinkedList.md b/CircularLinkedList.md
@@ -1,8 +0,0 @@
-
-CS202 L14
-
-## Notes
-
-**Definition:** This is a singly linked list where the last node points back to the first node. 
-
-This could be useful when implementing OS based threads as you would need to cycle through threads of execution when one thread gets blocked. There are not many other uses for this datastructure beyond this. 
diff --git a/ClassificationProblem.md b/ClassificationProblem.md
@@ -1,11 +0,0 @@
-# Classification Problem
-
-ML 1
-
-# Notes
-
-**Definition:** There is a discrete number of possible outcomes. 
-
-In other words, if there is a finite set of possible outcomes, it is a classification problem. Oftentimes this manifests as yes/no, but also could include much larger sets of possible values. 
-
-The alternative to this would be a [[RegressionProblem.md]] where the output is a continuous set of values. 
diff --git a/Clip.md b/Clip.md
@@ -1,11 +0,0 @@
-# Clip
-
-CG W13 L3
-
-## Notes
-
-**Definition:** Prerecorded set of frames representing an object in motion.
-
-This can be thought of in a similar way to tiling where the start and end should be the same and then repeated over and over again.
-
-See [[Animation.md]] for more.
diff --git a/Closure.md b/Closure.md
@@ -1,17 +0,0 @@
-# Closure
-
-Khan
-
-## Notes
-
-**Definition:** Closure means that performing some arbitrary operation (pick one, but not necessarily all) on any member of a set will result in another element of a set. 
-
-In the context of subspaces, we have closure under scalar multiplication and vector addition because these operations on any element of the [[LinearSubspace.md]] set results in another element of the set (by definition).
-
-## Discrete Math
-
-**Definition:** A closure is when we have a property such that the relation contains the original set of the relation and has the minimum number of extra components to be closed with respect to the property.
-
-Closure under addition means we have the minimum number of elements in a relation such that the domain and codomain are included in the first and second components of ordered pairs, and all other elements necessary for addition to result in another element of the relation are included.
-
-An important note is that there are often many relations for which the relation has the property and contains the codomain and domain, but we are only interested in the smallest one, the one that has the fewest other elements in it.
diff --git a/ClusteringAlgorithms.md b/ClusteringAlgorithms.md
@@ -1,15 +0,0 @@
-# Clustering Algorithms
-
-ML L1
-
-## Notes
-
-**Definition:** An algorithm that groups data together with other like items. 
-
-Think google news group related stories amongst other things. 
-
-This is often done via [[UnsupervisedLearning.md]]
-
-An important distinction between this and [[ClassificationProblem.md]] is that clustering algorithms don't know the groupings before hand and are unsupervised. This means they know certain samples are similar, but does not have a term to describe said membership. 
-
-Clustering algorithms can also be hierarchical where they have groupings and then subgroupings as well. 
diff --git a/Codeword.md b/Codeword.md
@@ -1,13 +0,0 @@
-# Codeword
-
-Ch 6
-
-## Notes
-
-**Definition:** A codeword is an element c(x) where c is a binary code and x is a message.
-
-Remember, a binary code is defined as c : S -> {0,1}\*.
-
-To find the average codeword length we simply compute:
-
-$\frac{\sum_{x\in S} P(x) len(c(x))}{|S|}$ 
diff --git a/Codomain.md b/Codomain.md
@@ -1,15 +0,0 @@
-# Codomain
-
-Khan
-
-## Notes
-
-**Definition:** The codomain of a function is a set that contains all possible mappings from the domain of inputs to outputs. This set can also contain values that are not mapped to from the domain by the function.
-
-See [[Range.md]] for only the subset of the codomain that is mapped to.
-
-Defined formally, we can have any codomain C(f) that fulfills the following where D is the domain of the function f:
-
-$C(f) \supseteq \{y \space | \space \exists x \in D \text{ such that } f(x) = y\}$
-
-Despite the openness of this, we often use a predefined set as the set considered the codomain, but it can be any set we choose to define that contains the range of the function.
diff --git a/Collection.md b/Collection.md
@@ -1,7 +0,0 @@
-# Collection
-
-Ch 0
-
-## Notes
-
-**Definition:** Collection datatypes are datatypes that can, theoretically, store an arbitrarily large number of elements.
diff --git a/Collision.md b/Collision.md
@@ -1,7 +0,0 @@
-# Collision
-
-Ch 5
-
-## Notes
-
-**Definition:** A collision, with respect to hash tables, is when we try to place an element into a position in the array that is already taken. 
diff --git a/ColumnSpace.md b/ColumnSpace.md
@@ -1,9 +0,0 @@
-# Column Space 
-
-Khan
-
-## Notes
-
-**Definition:** The column space of a matrix the space that contains all combinations of the columns.
-
-In the case of a 3x2 (3 rows, 2 columns) matrix this, generally, is a plane. I say generally as if the two vectors are on the same line then they would be simply a line and not a plane.
diff --git a/Combination.md b/Combination.md
@@ -1,9 +0,0 @@
-# Combination
-
-TB 6.3
-
-## Notes
-
-**Definition:** A combination is a unique selection of elements from a given set. 
-
-The difference between a combination and a permutation is rearrangements of combinations are still considered the same whereas the opposite is true for permutations.
diff --git a/CombinatorialProof.md b/CombinatorialProof.md
@@ -1,15 +0,0 @@
-# Combinatorial (Counting) Proofs
-
-Ch 6.3
-
-## Notes
-
-**Definition:** A combinatorial proof is a proof that shows we are counting the same set and thus they are equivalent.
-
-Example:
-
-Prove That $\binom{n}{k} = \binom{n}{n-k}$
-
-Proof:
-
-Consider n choose k. This describes all combinations of length k of a set length n. This computation can be thought of inversely as well. Consider, each of these combinations has a complement where we have all elements not selected in the current combination. If we assume A is the current combination $\bar A$ is all elements of the set N (|N| = n), not in A. We know the set of all possible $\bar A$ is just all non-selected elements and thus has the same cardinality as the set A because we are describing the **same thing from two sides**. $\blacksquare$
diff --git a/Combinatorics.md b/Combinatorics.md
@@ -1,9 +0,0 @@
-# Combinatorics
-
-Ch 6.1
-
-## Notes
-
-**Definition:** Combinatorics is the study of counting.
-
-Combinatorics is commonly used for enumeration in probability theory and sometimes computer science.
diff --git a/Commutative.md b/Commutative.md
@@ -1,13 +0,0 @@
-# Commutative
-
-1.3.2
-
-## Notes
-
-**Definition:** The commutative property states the order by which the objects are placed does not effect the outcome of said operation. 
-
-a x b = b x a
-
-a + b = b + a
-
-p v q = q v p
diff --git a/Complement.md b/Complement.md
@@ -1,9 +0,0 @@
-# Complement
-
-L1
-
-## Notes
-
-**Definition:** The complement of a set is the set of all elements not in the original set, but in the consideration space (often sample space).
-
-There are technically two types of complements the absolute and relative complements. Generally we are talking about the relative complement which is the set defined as the difference between the superset and the subset. The absolute complement uses the U set ([[UniversalSet.md]]) as the superset. 
diff --git a/ComplexVectorSpace.md b/ComplexVectorSpace.md
@@ -1,7 +0,0 @@
-# Complex Vector Space
-
-Ch 1
-
-## Notes
-
-**Definition:** A complex vector space is a vector space on the complex numbers (C).
diff --git a/CompositeNumber.md b/CompositeNumber.md
@@ -1,7 +0,0 @@
-# Composite Number
-
-U 2.4
-
-## Notes
-
-**Definition:** A composite number is a number that is not prime and thus is composed of two or more prime numbers. 
diff --git a/ComputerArchitecture.md b/ComputerArchitecture.md
@@ -1,44 +0,0 @@
-# Computer Architecture
-
-Links to information learned from computer architecture course
-
-## Questions I would like to answer
-
-1. Generally, how does a CPU work and how does it interface with other computer components?
-2. Why do ARM chips with a reduced instruction set appear to be more efficient?
-3. What are the steps to create a computer in a video game?
-
-## Main Links
-
-- [HistoricalDesigns](HistoricalDesigns.md)
-- [Abstraction](Abstraction.md)
-- [Memory](Memory.md)
-- [Scheduling](Scheduling.md)
-- [Algorithm](Algorithm.md)
-- [ISA](ISA.md)
-- [OutOfOrderExecution](OutOfOrderExecution.md)
-- [SuperScalar](SuperScalar.md)
-- [MooresLaw](MooresLaw.md)
-- [ForwardThoughts](ForwardThoughts.md)
-- [DesignPoint](DesignPoint.md)
-- [DRAMRowHammer](DRAMRowHammer.md)
-- [VonNeumannModel](VonNeumannModel.md)
-- [BarrierSynchronization](BarrierSynchronization.md)
-- [MicroArchitecture](MicroArchitecture.md)
-- [Instruction](Instruction.md)
-- [Opcode](Opcode.md)
-- [BitSteering](BitSteering.md)
-- [BCD](BCD.md)
-- [ProgrammerVisibleState](ProgrammerVisibleState.md)
-- [MUX](MUX.md)
-
-To do:
-
-- [Hamming](Hamming.md)
-- [PipelineControl](PipelineControl.md)
-- [CircuitTechnology](CircuitTechnology.md)
-- [VLIW](VLIW.md)
-- [SRAM](SRAM.md)
-- [Adder](Adder.md)
-- [Cache](Cache.md)
-- [CriticalPath](CriticalPath.md)
diff --git a/ComputerSecurity.md b/ComputerSecurity.md
@@ -1,11 +0,0 @@
-# Computer Security
-
-Main index for notes related to CSCI 370, Computer Security
-
-## Links
-
-### 1.6 - Cryptography
-
-- [ ] Keyless.md
-- [ ] SingleKey.md
-- [ ] TwoKey.md
diff --git a/ConditionalDisjunction.md b/ConditionalDisjunction.md
@@ -1,7 +0,0 @@
-# Conditional Disjunction
-
-1.3.2
-
-## Notes
-
-**Definition:** The conditional disjunction rule states $p \to q \equiv \neg p \vee q$.
diff --git a/ConditionalProbabilities.md b/ConditionalProbabilities.md
@@ -1,9 +0,0 @@
-# Conditional Probabilities
-
-Stats D2 - Prob L2
-
-## Notes
-
-**Definition:** Conditional probabilities are probabilities of some outcome given some assumed condition. 
-
-An example of this is there is an 80% chance that a republican will be in favor of something. This is a conditional probability where the condition is being republican and the probability is 80%. This is in contrast with the overall probability of being republican and being in favor which also shares a sample space with republican's who aren't, liberals who are, liberals who aren't, independents who are, and independents who aren't. Given this, the overall probability is far lower than the conditional probability which is generally the case. 
diff --git a/ConditionalProbability.md b/ConditionalProbability.md
@@ -1,9 +0,0 @@
-# Conditional Probability
-
-Ch 1.4
-
-## Notes
-
-**Definition:** Conditional probability is the probability of a given event assuming another event has already occurred.
-
-P(A|B) = Probability of the event A given the event B occurred.
diff --git a/ConditionalProbabilityTheroem.md b/ConditionalProbabilityTheroem.md
@@ -1,13 +0,0 @@
-# Conditional Probability Threoem 
-
-L2
-
-## Notes
-
-**Definition:** Conditional probability theroem is $P(A|B) = \frac{P(A \cap B)}{P(B)}$.
-
-This theroem is used to find the probability of some outcome given another piece of information. This is also referred to as [[ConditionalProbabilities.md]].
-
-## Intuition
-
-The probability that A is true given B is the same as the probability of A and B divided by the overall probability of B.
diff --git a/ConfusionMatrix.md b/ConfusionMatrix.md
@@ -1,7 +0,0 @@
-# Confusion Matrix
-
-ML CH3
-
-## Notes
-
-**Definition:** A confusion matrix is a matrix that describes the number of confused sample predictions a model has broken down by both the actual and predicted values.
diff --git a/Congruence.md b/Congruence.md
@@ -1,7 +0,0 @@
-# Congruence (over mod)
-
-U 2.4
-
-## Notes
-
-**Definition:** Congruence describes the relationship between two numbers such that $a \equiv b (mod c)$.
diff --git a/CongruenceClass.md b/CongruenceClass.md
@@ -1,7 +0,0 @@
-# Congruence Class
-
-U 2.4
-
-## Notes
-
-**Definition:** A congruence class is the set of all integers such that $a \equiv b (modc)$ for all integers a.
diff --git a/Connected.md b/Connected.md
@@ -1,7 +0,0 @@
-# Connected
-
-Ch 4
-
-## Notes
-
-**Definition:** Connected, in graph theory, means that there is a way to get from any node to any other node in the graph.
diff --git a/ConnectedComponent.md b/ConnectedComponent.md
@@ -1,7 +0,0 @@
-# Connected Compoment
-
-Ch 4
-
-## Notes
-
-**Definition:** A connected component is a subgraph in which each component of the subgraph is conected.
diff --git a/Connectives.md b/Connectives.md
@@ -1,28 +0,0 @@
-# Connectives (Logical Operators)
-
-1.1.1
-
-## Notes
-
-**Definition:** Connectives are necessary for the creation of compound propositions and they are the following:
-
-- Negation (not | $\neg$)
-- Conjunction (and | $\wedge$)
-- Disjunction (or | $\vee$)
-- Implication (If, then | $\to$)
-- Biconditional (If and only if | $\iff$)
-- Exclusive or ($\oplus$)
-
-Simple Proposition:
-
-The sun is red.
-
-Compound Proposition:
-
-The sun is red on tuesday.
-
-p = tuesday
-
-q = sun is red
-
-$p \to q$
diff --git a/Contingency.md b/Contingency.md
@@ -1,9 +0,0 @@
-# Contingency
-
-1.3.1
-
-## Notes
-
-**Definition:** A contingency is a proposition that is neither always true nor always false. 
-
-An example of a contingency is simply $p$.
diff --git a/ContinuousProbability.md b/ContinuousProbability.md
@@ -1,11 +0,0 @@
-# Continuous Probability
-
-Stats Ch1
-
-## Notes
-
-**Definition:** A continuous probability is one where there are an uncountable number of outcomes. 
-
-This is often defined by intervals either finite or infinite.
-
-To graph continuous probabilities we often use density (kde) graphs to show probability of any given input lasting an amount of time. These are referred to as [[ProbabilityDensityFunctions.md]] of pdfs. While histograms fill a similar role, they are not considered a pdf because they use bins instead of continuity.
diff --git a/Contradiction.md b/Contradiction.md
@@ -1,9 +0,0 @@
-# Contradiction
-
-Throughout textbook
-
-## Notes
-
-**Definition:** Contradiction is used to prove if then statements. This is done by assuming the then is true and the if is not true which would imply the statement is false. From here, you then show this causes a contradiction thus if the if is true then the then is true. 
-
-A contradiction is a proposition that is always false such as $p \wedge \neg p$.
diff --git a/Contrapositive.md b/Contrapositive.md
@@ -1,11 +0,0 @@
-# Contrapositive
-
-Throughout TB - U1.7.2 Discrete TB
-
-## Notes
-
-**Definition:** To prove an if then statement with contrapositive we assume the then statement is false. Following from here we then prove the if part must also be true for the then to be false. So it follows that if the first is true then the second is also true because the second is never true when the first is false. 
-
-This is of the form $\neg q \to \neg p$ where we switch the statements and negate both. To just negate both we [[Inverse.md]] it.
-
-This always has the same truth value as the original.
diff --git a/Converse.md b/Converse.md
@@ -1,9 +0,0 @@
-# Converse 
-
-1.1.2
-
-## Notes
-
-**Definition:** The converse of a statement is to switch both sides of an implication statement.
-
-$p \to q$ converse is $q \to p$.
diff --git a/Coordinate.md b/Coordinate.md
@@ -1,11 +0,0 @@
-# Coordinate
-
-**Source:** Linear Algebra Done Right
-
-**Chapter:** 1
-
-## Notes
-
-**Definition:** A coordinate is a singular component of a vector or list.
-
-Consider v = (1, 4, 5). The third component of v is 5, the second component of v is 4, and the first component of v is 1.
diff --git a/Correlation.md b/Correlation.md
@@ -1,9 +0,0 @@
-# Correlation
-
-Stats D2
-
-## Notes
-
-**Definition:** Correlation is the strength and direction relationship between two variables. This value is bounded between -1 and 1 where 0 is no correlation, 1 is pure positive linear relationship, and -1 is a pure negative linear relationship.
-
-See [[CorrelationCoefficient.md]] for an applied example.
diff --git a/CorrelationCoefficient.md b/CorrelationCoefficient.md
@@ -1,9 +0,0 @@
-# Correlation Coefficient
-
-ML CH2
-
-## Notes
-
-**Definition:** The correlation coefficient is a floating point number that represents the strength of a linear relationship between two variables x and y. 
-
-The highest value is 1 and the lowest is -1. 1 and -1 mean there is either a proportional or inverse relationship between the two variables. 
diff --git a/CountSort.md b/CountSort.md
@@ -1,7 +0,0 @@
-# Count Sort
-
-L5
-
-## Notes
-
-**Definition:** Count sort is a non-comparative sorting algorithm where we count the total number of instances of a given value and then reassemble a sorted output by creating a datastructure that contains the number of each value specified by the count. 
diff --git a/CounterExample.md b/CounterExample.md
@@ -1,7 +0,0 @@
-# Proof By Counter Example 
-
-Abstract Math Proof Technique
-
-## Notes
-
-**Definition:** Counter example proofs are similar to [[DirectProof.md]], but instead of assuming that they are true you assume they are false. From this assumption you then need to show that this is in some way fallacious.  
diff --git a/CountingPrinciple.md b/CountingPrinciple.md
@@ -1,7 +0,0 @@
-# Counting Principle
-
-Ch 0
-
-## Notes
-
-**Definition:** The counting principle is an enumeration technique where you determine the branching factor at each step and multiply all branching factors to find the total number of possible paths. 
diff --git a/Covariance.md b/Covariance.md
@@ -1,13 +0,0 @@
-# Covariance 
-
-Stats D2
-
-## Notes
-
-**Definition:** Covariance is the strength of a linear relationship between two different variables. When this number is larger it indicates that higher numbers for one of the variables is associated with higher numbers for the other. The inverse is also true (negative results in negative cov)
-
-There are also no bounds for the range of covariance unlike correlation.
-
-Cov(X,X) = Var(X) // keep in mind this is the squared unit like variance.
-
-See [[CorrelationCoefficient.md]] for normalized version of this value.
diff --git a/CramersRule.md b/CramersRule.md
@@ -1,11 +0,0 @@
-# Cramer's Rule
-
-3B1B
-
-## Notes
-
-**Definition:** Cramer's rule is an alternative to [[GaussianElimination.md]] for solving systems of equations.
-
-While slower and generally worse, it is novel.
-
-I don't really understand how this works.
diff --git a/CreditAssignmentProblem.md b/CreditAssignmentProblem.md
@@ -1,7 +0,0 @@
-# Credit Assignment Problem
-
-L1
-
-## Notes
-
-**Definition:** The credit assigment problem is an RL problem where we need to determine how to rate choices in the near term given their long term consequences.
diff --git a/CriticalPath.md b/CriticalPath.md
diff --git a/CrossProduct.md b/CrossProduct.md
@@ -1,24 +0,0 @@
-# Cross Product
-
-Khan
-
-## Notes
-
-**Definition:** The cross product of two vectors is the vector orthogonal to them. 
-
-The cross product is only defined in R^3. 
-
-### Calculation
-
-To calculate the cross product we simply do the following:
-
-a = [a_1, a_2, a_3]
-b = [b_1, b_2, b_3]
-
-o = [a_2b_3 - a_3b_2 , a_3b_1 - a_1b_3 , a_1b_2 - a_2b_1]
-
-Conceptually, we take the determinant of the bottom two rows then reverse determinent of top and bottom rows then deteminent of top two rows.
-
-The length of the cross product is the distance the second vector is projected onto the first vector multiplied by the length of the first vector. This is pretty much the inversion of the dot product's value where we find the projected distance of the second multiplied by the distance of the first.
-
-The cross product's length is 0 if both vectors are colinear. Conversely, it is maximized when the vectors are orthogonal.
diff --git a/CrossValidation.md b/CrossValidation.md
@@ -1,9 +0,0 @@
-# Cross-Validation
-
-ML CH3
-
-## Notes
-
-**Definition:** Cross validation is the process of creating a subset of your data and then training the model on some subset of said data.
-
-A common form of this is k-fold cross-validation. This creates k-folds (subsets) and trains the model on each subset that is not selected. Then it validates the accuracy upon the one subset that was not selected to be used in training to use it as the validation set. 
diff --git a/Crosstabulation.md b/Crosstabulation.md
@@ -1,17 +0,0 @@
-# Crosstabulation
-
-Stats D4
-
-## Notes
-
-**Definition:** Crosstabulation in stats is a way to display three dimensional information. Across the top and side you have some classes and then in the table itself we have the information of the cross-sectional group.
-
-ex:
-
-Admittance  	Male	Female
-
-		Admitted 1198  557
-		Rejected 1493  1278
-
-
-This data can be shown using a [[MosaicPlot.md]] for graphical viewing with sized boxes.
diff --git a/CumulativeDensityFunction.md b/CumulativeDensityFunction.md
@@ -1,19 +0,0 @@
-# Cumulative Density Function (CDF)
-
-Prob L8
-
-## Notes
-
-**Definition:** A cumulative density function is a function of a random variable where any given value is the probability of getting an output less than or equal to the current value.
-
-This is defined mathmatically as F(x) = P(X \leq x). 
-
-The value of a CDF is it combines discrete probability functions (PMFs) and continuous probability functions (PDFs) into one function definition resolving the need to do multiple computations for both. 
-
-Note: CDFs sum up to an arbitrary value, but the value at F(inf) should always be 1.
-
-## Values
-
-With a CDF we find the probability of a value or less than it being selected as the value F(x). 
-
-When finding the probability of a range we subtract the end from the start of the range.
diff --git a/CumulativeRelativeFrequency.md b/CumulativeRelativeFrequency.md
@@ -1,7 +0,0 @@
-# Cumulative Relative Frequency
-
-Khan
-
-## Notes
-
-**Definition:** The cumulative relative frequency of some value is all prior probabilities added up along with the current value's probability.
diff --git a/Cycle.md b/Cycle.md
@@ -1,7 +0,0 @@
-# Cycle
-
-Ch 4
-
-## Notes
-
-**Definition:** A cycle is a path with (when removing the last node) that starts and ends at the same node where the sequence is at least 3 long. 
diff --git a/DBSCAN.md b/DBSCAN.md
@@ -1,13 +0,0 @@
-# DBSCAN (Density based spatial clustering of applications with noise)
-
-ML D5
-
-## Notes
-
-**Definition:** DBSCAN is a clustering algorithm that groups clusters by continuous regions of high density.
-
-Steps to perform:
-1. For each instance count how many instances are in the neighborhood
-2. If it has at least min_samples instances in neighborhood it is a core instance (located in dense area)
-3. All instances in the neighborhood of a core instance belong to the same cluster
-4. All other instance that are not core instances and do not have one in the neighborhood are anomalies.
diff --git a/DRAM.md b/DRAM.md
@@ -1,12 +0,0 @@
-# DRAM
-
-DRAM is what we think of as RAM. See [[Memory.md]] for other links.
-
-## Notes
-
-
-[[DRAMBanks.md]] are a 2d matrix of [[DRAMCell.md]] and it is accessed by rows. When the processor wants a row, it activates the row, sends it to the [[RowBuffer.md]], and then sends the data out. Subsequent accesses of a different column are very fast because the row is already in a buffer. This can be thought of cached rows.
-
-One optimization done is to prioritize memory requests associated with memory that is already buffered to decrease context switching. This causes issues with multiple applications because it will prioritize applications that use localized memory more often. You can also create programs that take advantage of this to deny memory from other applications. On the flip side, if you are simply using an oldest request scheduling algorithm then random access requests will take more time and thus if one application uses more of them it will get more time than the other application. 
-
-[[DRAMChips.md]] are the larger DRAM unit that includes both the Banks and associated circuitry.  
diff --git a/DRAMBanks.md b/DRAMBanks.md
@@ -1,5 +0,0 @@
-# DRAM Banks
-
-## Notes
-
-**Definition:** 2d bank of [[DRAMCell.md]] that is accessed by a row at a time rows may be around 8kb in size.  
diff --git a/DRAMCell.md b/DRAMCell.md
@@ -1,7 +0,0 @@
-# DRAM Cell
-
-## Notes
-
-A DRAM Cell is the cell used to store one bit of information. It is made of a capacitor and an access transistor. The data is stored in the charge of the capacitor. 
-
-The access transistor is how you are able to query them. Since the access transistor is not perfect nor is the transistor they leak energy over time. As such they need to be refreshed over time using [[DRAMRefresh.md]].   
diff --git a/DRAMChips.md b/DRAMChips.md
@@ -1,5 +0,0 @@
-# DRAM Chips
-
-## Notes
-
-DRAM Chips are the chips that contain the [[DRAMBanks.md]] along with associated circuitry. There are many chips (I think normally 8) that make up a RAM module. 
diff --git a/DRAMRefresh.md b/DRAMRefresh.md
@@ -1,11 +0,0 @@
-# DRAM Refresh
-
-## Notes
-
-This is the process of refreshing the energy stored in a [[DRAMCell.md]]'s capacitor so that losses in energy over time do not cause loss of data (bitrot). 
-
-Currently, as of 2015, refreshes are required every 64ms. This costs electricity, can cause blocking issues, and as there is scaling these computations become slower and more power consuming. As an example, with 64gb DRAM refreshes can take up to 46% of time while 4gb is about 8%
-
-There is little coordination between the OS and the memory controller and the memory controller can't store information about what memory is allocated. This means that instead of just refreshing allocated memory, all memory is refreshed at the given frequency.
-
-This process is ran every 64ms and while most [[DRAMCell.md]]'s can go much longer than 64ms the LCD is 64ms. This is a bar for manufacturing that causes bad RAM to be thrown away. The memory controller could probably be smarter about this, but this is not done. These thoughts about how to optimize these things are RAIDR which is Retention aware intelligent DRAM refresh particularly thing thinking about [[BloomFilter.md]] usage to track which cells need more frequent refreshes. This can reduce refreshes by 74.6% at the cost of 1.25kb of memory with 8gb chips. 
diff --git a/DRAMRowHammer.md b/DRAMRowHammer.md
@@ -1,6 +0,0 @@
-
-Computer Architecture L1
-
-## Notes
-
-See [[DisturbanceErrors.md]] for more information as it describes this vulnerability. 
diff --git a/DataAugmentation.md b/DataAugmentation.md
@@ -1,9 +0,0 @@
-# Data Augmentation
-
-ML P773
-
-## Notes
-
-**Definition:** Data augmentation is the process of changing training data in such a way to make the training data set larger and more robust.
-
-In CNNs this often involves rotation, lighting, flipping, and other augmentations.
diff --git a/DataFlow.md b/DataFlow.md
@@ -1,14 +0,0 @@
-
-Computer Architecture L2
-
-## Notes
-
-**Definition:** This is a theory of computation that stipulates execution of code should be on a dependence basis instead of in order. If one instruction is dependent upon another that has not been executed it should not be executed, but if all dependencies have been executed then the code can be executed, if chosen to.
-
-This model is in contrast with [[VonNeumannModel.md]] where everything is sequential. 
-
-Data flow can be easily visualized as a graph.
-
-This paradigm requires a differently designed processor. 
-
-This paradigm is also, sort of, implemented via [[OutOfOrderExecution.md]]
diff --git a/DataStructureAugmentation.md b/DataStructureAugmentation.md
@@ -1,9 +0,0 @@
-# Data Structure Augmentation
-
-L2
-
-## Notes
-
-**Definition:** Data structure augmentation is adding something to a data structure to improve it in some way. 
-
-An example of this is to improve a singly linked list with a tail pointer so polling the tail can be done in O(1) instead of O(n) time. By doing this we could also have constant time additions onto the end of the list (ensure pointer is updated).
diff --git a/DecisionThreshold.md b/DecisionThreshold.md
@@ -1,9 +0,0 @@
-# Decision Threshold
-
-ML CH3
-
-## Notes
-
-**Definition:** In classical classification, a decision threshold is the position on some line where greater values are classified in some way and lesser value another way. 
-
-When we have a higher threshold it increases precision because things that are less likely to be classified will be considered not part of the set, but in turn doing this also decreases the recall because it is more likely to give false negatives.
diff --git a/DecisionTrees.md b/DecisionTrees.md
@@ -1,64 +0,0 @@
-# Decision Trees
-
-ML D4
-
-## Notes
-
-**Definition:** Decision trees are a machine learning algorithm that does true/false comparison to go left and right until reaching a leaf node. This leaf node will then describe the output.
-
-### Associated Links
-
-Classification and Regression Trees by Leo Breiman
-
-
-
-### Visualizing
-
-You can use graphviz to visualize this graph. First, you train the model using sklearn.tree then you import export_graphviz from the same location. Using export_graphviz you can pass in the model, output file, feature names, class names , and some other information which will create a dotfile. 
-
-Then, you can import graphviz and user Source.from_file() to load in the dot file and view it.
-
-Ex:
-
-```python3
-from sklearn.tree import export_graphviz
-from graphviz import Source
-
-graphData = export_graphviz(
-    tree_clf,
-    out_file='../graphs/iris_tree.dot',
-    feature_names=["petal length (cm)", "petal width (cm)"],
-    class_names=iris.target_names,
-    rounded=True,
-    filled=True
-)
-Source.from_file('../graphs/iris_tree.dot')
-```
-
-### Other Info
-
-There are root nodes and what are called 'split nodes' which is where the trees splits into two more nodes based on True/False comparisons. 
-
-An interesting thing about decision trees is that no feature scaling is required as features aren't compared to other features, unless you engineer another feature as some combination of them.
-
-In the context of decision trees, samples for a split node refers to the number of samples that made it to this point. This also applies for leaf nodes as well whereby it describes the number of samples made it to said leaf node.
-
-The 'gini' attribute measures the  impurity of a leaf node. A leaf node of 0 would mean all samples that made it to the node are a member of the target class whereas a value of .4 would mean 40% of the samples would be of another class.
-
-Scikit learn creates binary trees by using the CART algorithm but there are other decision tree implementations where it is not expressly yes/no such as ID3 where nodes can have more than two children.
-
-Decision trees can output probabilities based on the values that are used to generate the gini value. These are generally a list such as [50 , 2, 5] where 50 is the most probable and the others are lesser probabilities.
-
-The max_depth hyperparameter is the best way to regularize decision trees and reduce overfitting risks. There is also max features (comparisons per node), leaf nodes, min samples split, and min samples leaf which do similar restriction.
-
-
-### Uhh Ohh
-
-These things really like orthogonals but not so much angles. If you have a dataset that is easily seperatble at an angle but not vertically or horizontally you will have a bad time with decision trees.
-
-One mediation for this is to use a PCA which rotates the data to reduce correlation between features.
-
-
-### Hmmm....
-
-Scikit learn uses a stocastic sampling when training decision trees meaning they aren't consistent training to training. This is why random forests can be cool.
diff --git a/DeepLearning.md b/DeepLearning.md
@@ -1,21 +0,0 @@
-# Deep Learning
-
-This index tracks deep learning-related content. While much of my deep learning (DL) material is interspersed throughout my machine learning (ML) notes, separating them seems appropriate given the volume of ML notes I've accumulated.
-
-## Resources Studied
-
-1. **Deep Learning** - Goodfellow, Bengio, Courville
-
-## Links by Resource
-
-### **Deep Learning** - Goodfellow, Bengio, Courville
-
-Chapter 1
-
-- KnowledgeBaseApproach
-- PolarCoordinates
-- RepresentationLearning
-- FactorsOfVariation
-- ComputationalGraph
-- ProbabilisticGraph
-- DistributedRepresentation
diff --git a/Degree.md b/Degree.md
@@ -1,7 +0,0 @@
-# Degree
-
-CG W13 L2
-
-## Notes
-
-**Definition:** Degree is a term used to describe the number of edges meeting a [[Vertex.md]].
diff --git a/DemorgansLaw.md b/DemorgansLaw.md
@@ -1,43 +0,0 @@
-# Demorgan's Laws
-
-1.3.2
-
-## Notes
-
-**Definition:** These are two fundamental laws of boolean algebra that can be simply derived.
-
-$\neg (p \wedge q) \equiv \neg p \vee \neg q$
-
-$\neg (p \vee q) \equiv \neg p \wedge \neg q$
-
-####  Showing these laws are true with truth tables
-
-First law (columns 4 and 7 are being shown as equivalent):
-
-| p | q | $p \wedge q$ | $\neg(p \wedge q)$ | $\neg p$ | $\neg q$ | $\neg p \vee \neg q$ |
-| - | - | - | - | - | - | - |
-| T | T | T | F | F | F | F
-| T | F | F | T | F | T | T
-| F | T | F | T | T | F | T
-| F | F | F | T | T | T | T
-
-Second law (columns 4 and 7 are being shown as equivalent):
-
-| p | q | $p \vee q$ | $\neg(p \vee q)$ | $\neg p$ | $\neg q$ | $\neg p \wedge \neg q$ |
-| - | - | - | - | - | - | - |
-| T | T | T | F | F | F | F
-| T | F | T | F | F | T | F
-| F | T | T | F | T | F | F
-| F | F | F | T | T | T | T
-
-#### What these are saying
-
-The first law states that not p and q is the same as not p or not q.
-
-The second law states that not p or q is the same is not p and not q.
-
-This is basically the distributive property of boolean logic whereby we flip the and/or connective and distribute the negation.
-
-#### For Quantifiers
-
-See [[Quantifiers.md]] section on negation which describes the distribution of a negation when quantifiers are involved.
diff --git a/DensityEstimation.md b/DensityEstimation.md
@@ -1,11 +0,0 @@
-# Density Estimation
-
-Stats D3
-
-## Notes
-
-**Definition:** Density estimation is the process of modeling the probability of given values for a dataset.
-
-This can be thought of similar to a histogram without the bins. A common form of this is a kde. The reason these can be better is that it does not have binning which can make data appear innacurately depending on the cut points and bin widths.
-
-In a general sense, kdes work by creating gaussian distributions about datapoints and then summing up these values at each point and then graphing that. This averages out the data to give a general graph of the data. The width of these gaussian distributions is dictated by the bandwidth hyperparameter.
diff --git a/DepthFirstSearch.md b/DepthFirstSearch.md
@@ -1,11 +0,0 @@
-# DFS
-
-CS202 L14
-
-## Notes
-
-**Definition:** Searching algorithm that traverses until reaching a leaf node then going back by one and doing the same on the other subtree.
-
-This normally uses the call [[Stack.md]] to search.
-
-Also see [[BreadthFirstSearch.md]]
diff --git a/DerivedDistribution.md b/DerivedDistribution.md
@@ -1,9 +0,0 @@
-# Derived Distribution
-
-L10
-
-## Notes
-
-**Definition:** Derived distributions are distributions where we take a function of a random variable. 
-
-This is generally defined as Y = g(X) where X is a random variable, Y is a random variable, and g is a function.
diff --git a/DesignPoint.md b/DesignPoint.md
@@ -1,16 +0,0 @@
-# Design Point
-
-CA L3
-
-## Notes
-
-**Definition:** The point of a computer's design including constraints of the system. 
-
-Here are some of the design constraints:
-
-1. Cost
-2. Energy Consumption
-3. Performance
-4. Availability (how long can it run)
-5. Reliability and Correctness
-6. Time to Market
diff --git a/Determinant.md b/Determinant.md
@@ -1,148 +0,0 @@
-# Determinant
-
-CS331 - Linear Algebra - Khan U2
-
-## Notes
-
-**Definition:** The determinant is the scaling factor of some area (or volume in 3d space) from before to after a linear transformation. Note that this is only useful in 3d and 2d as the notion of volume in higher dimensions ([[Hypervolume.md]]) is a bit abstract.
-
-This value can be negative if the space has been flipped. In 3d space, this means the volume after the tranformation is in left hand space if it was before in right hand space.
-
-Note: If the determinant of the matrix is 0 then the matrix is not invertible.
-
-## Calculation
-
-#### 2x2
-
-For a 2x2 matrix we have Det(A) = ad-bc where the matrix A is defined as <a,b> <c,d> where each vector is a row.
-
-Ex:
-
-A = [1 2]
-    [3 4]
-
-Det(A) = 4 - 6 = -2
-
-#### Generalized
-
-For each element in the first row of the matrix remove all values in the same row and column and then multiply the current element by the determinant of the remaining elements. 
-
-We then add all of the determinant element pairs where every other is negative (first is positive second negative and so on).
-
-This is called Laplace expansion or cofactor expansion.
-
-Note: You don't need to use the first row as the row you traverse to find the determinant. The only thing that matters is that the first row starts with a + and switches back and forth thus if you pick the last row to evaluate with you might start with a negative depending on the size of the matrix. It often makes sense to do this because it allows for selection of a row with many zeroes.
-
-With this, you are not stick layer to layer with a given row either so you can select the last row for the largest matrix and the second for a smaller and so on.
-
-Also, this can be done with either **any row or column**.
-
-#### Scalar
-
-When multiplying a **row** by a scalar the resulting determinant is equal to the original determinant multiplied by the scalar.
-
-This can be extrapolated to the size of a matrix (matrix times scalar) and thus it becomes dependent upon the number of rows in the matrix.
-
-#### Row Addition + Subtraction
-
-If you have two matricies that are identical apart from a singular row then the sum of the individual determinants will be equal to the determinant of the matrix where the differing row components are added together.
-
-Example:
-
-A = [1 2]
-	[3 4]
-
-B = [1 2]
-	[4 5]
-
-C = [1     2]
-	[3+4 4+5]
-
-|A| = 4 + -6 
-= -2
-
-|B| = 5 + -8 
-= -3
-
-|C| = 9 + -14 
-= -5 
-= |A| + |B|
-
-In a similar vein, when subtracing a row the resulting determinant is equal to the determinant of the original minus the determinant of the subtracted row's matrix.
-
-A = [1 2]
-	[3 4]
-
-B = [1 2]
-	[4 5]
-
-C = [1     2]
-	[3-4 4-5]
-
-|A| = 4 + -6 
-= -2
-
-|B| = 5 + -8 
-= -3
-
-|C| = -1 + 2
-= 1
-= |A| - |B|
-
-#### Row Swap
-
-If we are to swap two arbitrary rows in a matrix the det(A) = -det(S) where S is the swapped matrix.
-
-#### Upper Triangle Determinant
-
-The determinant of a matrix where the bottom left triangle is all zeroes is equal to the product of the values on the diagonal.
-
-Ex.
-
-	[2 9 8]
-A = [0 8 7]
-	[0 0 7]
-
-
-det(A) = 2x8x7 = 112
-
-Written out taking the determinant using the first column:
-
-det(A) = 2 | 8 7 | + 0 + 0
-		   | 0 7 |
-= 2x56
-= 112
-
-Given that:
-
-| 8 7 | = 8x7 = 56
-| 0 7 |
-
-#### Lower Triangle Determinant
-
-As was shown above, the same is true for a lower triangle matrix where the top right corner is all zeroes.
-
-[x 0 0 ... 0]
-[x x 0 ... 0]
-[...........]
-[...........]
-[...........]
-[x x ... x x]
-
-#### Simplification
-
-When simplifying the most important rule is that you can subtract any row from any other row in the matrix multiplied by an arbitrary scalar and still have the same determinant.
-
-Another rule that is important is the row swapping rule described above.
-
-More formally with a_x being a row in A:
-
-A = [a_1]
-	[a_2]
-	[a_3]
-
-B = [a_1 - ca_x]
-	[a_2	   ]
-	[a_3	   ]
-
-|A| = |B| for any scalar c and any row of A defined as a_x.
diff --git a/DeterministicFiniteAutomata.md b/DeterministicFiniteAutomata.md
@@ -1,21 +0,0 @@
-# Deterministic Finite Automaton (DFA) 
-
-**Source:** Theory of Computation
-
-**Lecture:** 2
-
-## Notes
-
-**Definition:** A deterministic finite automaton is a 5-tuple (Q, Sigma, delta, q_0, F) where each coordinate represents the following:
-
-1. Q - Finite **set** of states.
-2. Sigma - Finite [Alphabet](Alphabet.md).
-3. delta - This is a function from Q x Sigma -> Q. As such, this represents state transitions, referred to as the transition function.
-4. q_0 - Initial state (q_0 \in Q)
-5. F - Set of final states (F is a subset of Q)
-
-### Representation
-
-Often these are represented as a directed labeled graph (transition diagram).
-
-When creating these accepting states (final states) have a concentric circle in their node (double circles as prof. referred to it) and the start state (only 1) has an arrow leading in from nowhere.
diff --git a/DiagonalMatrices.md b/DiagonalMatrices.md
@@ -1,9 +0,0 @@
-# Diagonal Matrices
-
-Khan U2
-
-## Notes
-
-**Definition:** Diagonal matricies are matricies that have zeroes in all positions except for the diagonal from 1,1 to m,n.
-
-Diagonal matricies are the matricies that represent linear transformations where we are multiplying each axis component by some value and not combining different components together.
diff --git a/Digraph.md b/Digraph.md
@@ -1,9 +0,0 @@
-# Digraph
-
-Ch 4
-
-## Notes
-
-**Definition:** A digraph is a directed graph meaning each edge has only one direction in which traversal is possible.
-
-When discussing digraphs, the start of an edge is called the initial vertex and the end is called the terminal vertex.
diff --git a/DimensionalityReduction.md b/DimensionalityReduction.md
@@ -1,9 +0,0 @@
-# Dimensionality Reduction
-
-ML CH1
-
-## Notes
-
-**Definition:** This is where you have the goal of reducing the required data without losing too much information. This is like lossy compression. 
-
-This can be done by merging multiple correlated features into one. This is referred to as feature extraction where you extract a new feature from existing features to replace them. 
diff --git a/Dimensions.md b/Dimensions.md
@@ -1,11 +0,0 @@
-# Dimensions
-
-**Source:** Linear Algebra Done Right
-
-**Chapter:** 2
-
-## Notes
-
-**Definition:** The dimension of a vector space is defined as the length of any basis of the vector space.
-
-Recall that the basis of a subspace is by definition linearly independent and thus provides us with the dimension of the vector space.
diff --git a/DirectProof.md b/DirectProof.md
@@ -1,7 +0,0 @@
-# Direct Proof
-
-Abstract Math + Discrete Math U1.7.1
-
-## Notes
-
-**Definition:** Direct proof is the assumption that the initial statement is true. You then show that it being true is true for all cases. These statements start with, "Let's assume x is true" we then continue on to prove what it is that x implies. 
diff --git a/DirectSum.md b/DirectSum.md
@@ -1,11 +0,0 @@
-# Direct Sum
-
-**Source:** Linear Algebra Done Right
-
-**Chapter:** 1
-
-## Notes
-
-**Definition:** A direct sum is a sum of two vector spaces that are disjoint except for the zero vector.
-
-This can also be stated as each element being writeable as a **unique** combination (sum) of vectors in the vector spaces. This notion leads to a good way to test this.
diff --git a/DiscountFactor.md b/DiscountFactor.md
@@ -1,9 +0,0 @@
-# Discount Factor
-
-L2
-
-## Notes
-
-**Definition:** The discount factor in RL is the value gamma we use to describe how much or little we care about long term rewards with respect to the value function.
-
-The discount factor is to the power of the steps away you are from that reward so if gamma = .5 then we see we only care .5x as much about the next step as the current and then .25x as much about the one after that and so on.
diff --git a/DiscreteMath.md b/DiscreteMath.md
@@ -1,288 +0,0 @@
-# Discrete Math
-
-Discrete math related links.
-
-## Links
-
-#### Discrete Mathematics and Its Applications
-
-Unit 1.1 (logic):
-	- [[Proposition.md]]
-	- [[Negation.md]]
-	- [[Connectives.md]]
-	- [[Converse.md]] - Switch both
-	- [[Inverse.md]] - Negate both
-	- [[Contrapositive.md]] - Swap then inverse
-	- [[Biconditional.md]] - IFF
-
-Unit 1.2 (logic):
-	- [[Proposition.md]]
-	- [[Connectives.md]]
-
-Unit 1.3 (logic):
-	- [[Tautology.md]] - Always true - use the symbol t
-	- [[Contradiction.md]] - Always false
-	- [[Contingency.md]] - Must be evaluated
-	- [[DemorgansLaw.md]] - Two laws describing negation of and/or compound propositions
-	- [[ConditionalDisjunction.md]]
-	- [[DistributiveLaw.md]]
-	- [[WellDefined.md]]
-	- [[Commutative.md]]
-	- [[Satisfiable.md]]
-
-Unit 1.4 (proof):
-	- [[Predicate.md]]
-	- [[PropositionalFunction.md]]
-	- [[Quantifiers.md]]
-	- [[Universe.md]]
-	- [[Preconditions.md]]
-	- [[Postcondition.md]]
-
-Unit 1.5 (proof):
-	- [[NestedQuantifier.md]]
-
-Unit 1.6 (proof):
-	- [[LawOfDetachment.md]]
-
-Unit 1.7 (proof):
-	- [[DirectProof.md]]
-	- [[Contrapositive.md]] - Also known as contraposition
-	- [[Contradiction.md]]
-	- [[Cases.md]]
-	- [[VacuousProof.md]]
-	- [[ExhaustiveProof.md]]
-
-Unit 2.1 (sets):
-	- [[Set.md]]
-	- [[Subset.md]]
-	- [[PowerSet.md]]
-	- [[CartesianProduct.md]]
-	- [[TruthSet.md]]
-	- [[Complement.md]]
-	- [[Multiset.md]]
-
-Unit 2.3 (functions):
-	- [[Range.md]]
-	- [[Image.md]]
-	- [[Preimage.md]]
-	- [[Codomain.md]] 
-	- [[Injective.md]] - one-to-one
-	- [[Surjective.md]] - onto
-	- [[InverseFunction.md]]
-	- [[Floor.md]]
-	- [[Ceiling.md]]
-	- [[Bijective.md]]
-
-Unit 2.4 (sequence + other stuff):
-	- [[Sequence.md]]
-	- [[RecurrenceRelation.md]]
-
-Unit 2.3 (computation 5th edition):
-	- [Tractable](Tractable.md)
-	- [Intractable](Intractable.md)
-	- [Unsolvable](Unsolvable.md)
-	- [NPProblem](NPProblem.md)
-	- [PProblem](PProblem.md)
-	- [NPComplete](NPComplete.md)
-
-Unit 2.4 (integers and division):
-	- [NumberTheory](NumberTheory.md)
-	- [CompositeNumber](CompositeNumber.md)
-	- [PrimeNumber](PrimeNumber.md)
-	- [MersennePrime](MersennePrime.md) 
-	- [Div](Div.md)
-	- [Mod](Mod.md)
-	- [RelativelyPrime](RelativelyPrime.md)
-	- [PairwiseRelativelyPrime](PairwiseRelativelyPrime.md)
-	- [PrimeFactorization](PrimeFactorization.md) 
-	- [GCD](GCD.md)
-	- [LCM](LCM.md) 
-	- [Congruence](Congruence.md) 
-	- [CongruenceClass](CongruenceClass.md)
-	- [CaesarCipher](CaesarCipher.md)
-	- [VigenereCipher](VigenereCipher.md)
-	- [EuclideanAlgorithm](EuclideanAlgorithm.md) 
-	- [LinearCombination](LinearCombination.md)
-	- [LinearCongruence](LinearCongruence.md)
-
-Unit 2.4 (Sequences and Summations 8th edition)
-	- [RecurrenceRelation](RecurrenceRelation.md)
-
-Unit 6.1 (The Basics of Counting 8th edition)
-	- [Combinatorics](Combinatorics.md)
-	- [SumRule](SumRule.md)
-	- [TreeDiagram](TreeDiagram.md)
-	- [SubtractionRule](SubtractionRule.md)
-	- [DivisionRule](DivisionRule.md)
-	- [SumOfGeometricSeries](SumOfGeometricSeries.md)
-	- [CountingPrinciple](CountingPrinciple.md) - Also referred to as product rule.
-
-Unit 6.2 (Pigeonhole principle)
-	- [PigeonholePrinciple](PigeonholePrinciple.md)
-	- [GeneralizedPigeonholePrinciple](GeneralizedPigeonholePrinciple.md)
-	- [Subsequence](Subsequence.md)
-	- [RamseyNumbers](RamseyNumbers.md)
-
-Unit 6.3 (Permutations and Combinations)
-	- [Permutation](Permutation.md)
-	- [RPermutation](RPermutation.md)
-	- [Combination](Combination.md)
-	- [RCombination](RCombination.md)
-	- [CombinatorialProof](CombinatorialProof.md)
-	- [BijectiveProof](BijectiveProof.md)
-
-Unit 6.4 (Binomial Coefficient & Identities)
-	- [BinomialCoefficient](BinomialCoefficient.md) 
-	- [PascalsIdentity](PascalsIdentity.md)
-	- [VandermondesIdentity](VandermondesIdentity.md)
-	- [Binomial](Binomial.md)
-
-Unit 6.5 (Generalized Permutations & Combinations)
-	- [Distinguishable](Distinguishable.md)
-	- [Indistinguishable](Indistinguishable.md)
-
-Unit 8.2 (Solving Linear Recurrence Relations)
-	- [RecurrenceRelation](RecurrenceRelation.md)
-	- [LinearCombination](LinearCombination.md)
-	- [LinearHomogeneousRecurrenceRelation](LinearHomogeneousRecurrenceRelation.md)
-	- [CharacteristicEquation](CharacteristicEquation.md)
-	- [CharacteristicRoots](CharacteristicRoots.md)
-
-Unit 8.3 (Divide and Conquer)
-	- [DivideAndConquer](DivideAndConquer.md)
-
-Unit 8.5 (Inclusion Exclusion)
-	- [PrincipleOfInclusionExclusion](PrincipleOfInclusionExclusion.md)
-
-Unit 9.1 (Relations)
-	- [Relation](Relation.md) (define like function)
-	- [RelationOnASet](RelationOnASet.md) 
-	- [Reflexive](Reflexive.md) 
-	- [Symmetric](Symmetric.md)
-	- [Antisymmetric](Antisymmetric.md)
-	- [Transitive](Transitive.md)
-
-Unit 9.3 (Representing Relations)
-	- [ZeroOneMatrix](ZeroOneMatrix.md)
-	- [Digraph](Digraph.md) (initial and terminal for vertex names w/ respect to edges)
-	- [Loop](Loop.md)
-
-Unit 9.4 (Closures of Relations)
-	- [TransitiveClosure](TransitiveClosure.md)
-	- [ReflexiveClosure](ReflexiveClosure.md)
-	- [SymmetricClosure](SymmetricClosure.md)
-	- [Closure](Closure.md)
-
-Unit 9.5 (Equivalence Relations)
-	- [EquivalenceRelation](EquivalenceRelation.md)
-	- [EquivalenceClass](EquivalenceClass.md) ([a] notation)
-	- [Representative](Representative.md)
-	- [Partition](Partition.md)
-
-Unit 9.6 (Partial Orderings)
-	- [PartiallyOrderedSet](PartiallyOrderedSet.md)
-	- [HasseDiagram](HasseDiagram.md)
-	- [LexicographicOrdering](LexicographicOrdering.md)
-
-Unit 10.1 (Graphs)
-	- [Graphs](Graphs.md)
-	- SimpleGraph
-	- [Multigraph](Multigraph.md)
-	- [Loop](Loop.md)
-	- [PseudoGraphs](PseudoGraphs.md) (multi edges + multi loop + undirect)
-	- [MixedGraph](MixedGraph.md) (undirected + directed)
-
-Unit 10.2 (Graph Terms)
-	- Neighborhood
-	- DegreeOfVertex
-	- Isolated (deg(a) = 0)
-	- Pendant (deg(a) = 1)
-	- HandshakingTheorem
-	- InDegree
-	- OutDegree
-	- UnderlyingUndirectedGraph (directed -> undirected)
-	- CompleteGraph (fully connected)
-	- Wheel (add vertex connected to all elements of a cycle)
-	- [Bipartite](Bipartite.md)
-	- Matching
-	- MaximumMatching
-	- CompleteMatching
-
-Unit 10.3 (Representing Graphs and Isomorphisms)
-	- IncidenceMatrix - edges as columns
-	- AdjacencyList - List of all adjacent vertices (for sparse)
-	- [AdjacencyMatrix](AdjacencyMatrix.md) (for dense)
-	- Isomorphic
-	- GraphInvariant
-
-Unit 10.4 (Connectivity)
-    - Path
-    - Cycle 
-    - Closed Walk
-    - Trail
-    - SimplePath
-    - CutVertex (produces subgraph that is not connected)
-    - CutEdge - Bridge
-    - NonseparableGraph
-    - VertexCut - SeparationSet
-    - VertexConnectivity - Minimum verts in vertex cut or produce 1 vertex (fully connected graph)
-    - k-connected (vertex connectivity of graph >= k)
-    - EdgeCut
-    - EdgeConnectivity
-    - StronglyConnected (a->b and b->a for all b,a in digraph)
-    - WeaklyConnected (if path between the two assuming undirected graph)
-    - StronglyConnectedComponents / StrongComponents (maximal strongly connected subgraph)
-    - GiantStronglyConnectedComponent (GSCC - Connected component with significant amount of the graph's total vertices)
-
-Unit 10.5 (Euler and Hamilton Paths)
-    - Euler Circuit - Can traverse all edges (exactly once) back to self
-    - Euler Path - Same as above except a path with all edges instead of a circuit
-    - Hamilton Circuit - Can traverse all vertices (exactly once) back to self
-
-Unit 10.7 (Planar Graphs)
-    - Planar (can be drawn on a plane without edges crossing)
-    - Planar Representation (visual representation of planar graph without crossing edges)
-    - Regions
-    - Euler's Formula (r = e-v+2 where r is the number of regions in a planar representation)
-    - Homeomorphic
-    - Elementary Subdivision
-
-Unit 10.8 (Graph Coloring)
-    - Coloring
-    - DualGraph
-    - ChromaticNumber (minimum number of unique colors required to achieve a coloring denoted as \chi)
-    - FourColorTheorem - chromatic number of a planar graph is no greater than four.
-
-Unit 11.1 (Introduction to Trees)
-    - Tree (connected, undirected, with no simple circuits)
-    - Forest (disconnected, undirected, no simple circuits)
-    - Root
-    - RootedTree
-    - InternalVertices (vertices that have children in a tree (~opposite of leaf))
-    - mAryTree (m-ary trees are trees where each vertex has no more than m children)
-    - FullmAryTree (m-ary tree if every internal vertex has m children )
-    - OrderedRootedTree (rbt where children are ordered. this terminology allows us to say left and right children, but generally we leave out ordered part)
-    - Balanced (all leaves at either h or h-1 level (remember height starts at 0 for root))
-
-Unit 11.3 (Tree Traversals)
-    - UniversalAddressSystem (x_1.x_2.x_3...x_n for the current node where x_1... is the path from root to current. Notice we don't include x_0 := 0)
-    - TraversalAlgorithms (Way to traverse every vertex in ordered rooted tree)
-    - PreorderTraversal (Traverse right subtree first, adding current item at each step, thus first element is root)
-    - InorderTraversal (Leftmost then parent then right then leftmost then partent... final element is the rightmost leaf)
-    - PostorderTraversal (all child nodes starting from left, then parent, then right, so on)
-    - InfixForm
-    - PrefixForm
-    - PolishNotation
-    - PostfixForm
-    - ReversePolishNotation
-
-Unit 11.4 (Spanning Trees)
-    - SpanningTree - Subgraph of simple graph G s.t. it contains every vertex in G.
-    - DepthFirstSearch - Go deep then wide.
-    - BreadthFirstSearch - Start somewhere, go out.
-
-Unit 11.5 (Minimum Spanning Trees)
-    - MinimumSpanningTree - Spanning tree with weighted graph that minimizes sum of weights.
-    - PrimsAlgorithm - Select minimum weighted edge, select minimum edge incident without creating loop, repeart until n-1 edges have been selected
-    - KruskalsAlgorithm - Choose edge with minimum weight, choose next with min weight, continue until selecting n-1 edges, ensure not creating simple circuits
-    - 
diff --git a/DiscreteProbability.md b/DiscreteProbability.md
@@ -1,7 +0,0 @@
-# Discrete Probability
-
-Stats ch1
-
-## Notes
-
-**Definition:** A discrete probability is one where there are a finite set of outcomes or a countably infinite set of outcomes.
diff --git a/DiscreteRandomVariable.md b/DiscreteRandomVariable.md
@@ -1,7 +0,0 @@
-# Discrete Random Variable
-
-Ch 2.1
-
-## Notes
-
-**Definition:** A discrete random variable is a random variable with an outcome space of finite or countably infinite size. 
diff --git a/DiscreteUniformLaw.md b/DiscreteUniformLaw.md
@@ -1,7 +0,0 @@
-# Discrete Uniform Law
-
-L1
-
-## Notes
-
-**Definition:** The discrete uniform law states that if all outcomes in a [[SampleSpace.md]] are equally probable then P(A) where A is a set is the same as |A| / |Omega| where Omega is the entire sample space.
diff --git a/DisjointSet.md b/DisjointSet.md
@@ -1,7 +0,0 @@
-# Disjoint Set
-
-L1
-
-## Notes
-
-**Definition:** Disjoint sets are multiple sets where they have no elements in common.
diff --git a/DistanceCalculation.md b/DistanceCalculation.md
@@ -1,20 +0,0 @@
-# Distance Calculation
-
-Khan
-
-## Notes
-
-**Definition:** Distance calculation in any dimension is defined as sqrt((x_1 - y_1)^2 + (x_2 - y_2)^2 ...)
-
-In the above definition x_1 is the first component of the first point (or vector), and y_1 is the first component of the second point (or vector). We then repeat this by subtracting them, squaring them and then summing all of them. Finally, we take the square root. 
-
-In 1d this manifests as simply subtracting the first component of both as we square that and then take the square root of that.
-
-In 2d we have for the example (1,2) and (10, 12):
-
-	= sqrt((1 - 10)^2 + (2 - 12)^2)
-	= sqrt(81 + 100)
-	= sqrt(181)
-	= 13.453...
-
-We can expand this to higher dimensions, but you get the idea.
diff --git a/DistanceToPlane.md b/DistanceToPlane.md
@@ -1,15 +0,0 @@
-# Distance to Plane
-
-Distance from arbitrary point to plane
-
-## Notes
-
-If we take any point on the plane and then find the length of the opposite side of the new right triangle we then have the distance from the plane to the point.
-
-As such this is as simple as taking the dot product of the normal vector and the vector that connects the representative point and the other point. We then divide this by the lenght of the normal vector and that is the lenght of the opposite side which is also the distance between the plane and the point.
-
-To find distance to plane from point do the following:
-
-1. Find the vector from a representative point to the point
-2. Calculate the dot product between this vector connecting these two points and the normal vector
-3. Divide this value by the length of the normal vector
diff --git a/Distinguishable.md b/Distinguishable.md
@@ -1,7 +0,0 @@
-# Distinguishable
-
-Ch 6.5
-
-## Notes
-
-**Definition:** Distinguishable means items are different in some way such that switching them results in a new permutation.
diff --git a/DistinguishablePermutation.md b/DistinguishablePermutation.md
@@ -1,7 +0,0 @@
-# Distinguishable Permutation
-
-Ch 1.3
-
-## Notes
-
-**Definition:** A distinguishable permutation is a permutation that can be distinguished from all other permutations.
diff --git a/Distributive.md b/Distributive.md
@@ -1,7 +0,0 @@
-# Distributive 
-
-Ch 2.2
-
-## Notes
-
-**Definition:** Distributivity is a property of operators such that a(b+c) = ab + ac. 
diff --git a/DistributiveLaw.md b/DistributiveLaw.md
@@ -1,11 +0,0 @@
-# Distributive Law
-
-1.3.2
-
-## Notes
-
-**Definition:** The distributive law of disjunction states $p \vee (q \wedge r) \equiv (p\vee q) \wedge (p \vee r)$.
-
-This can be thought of as being something or two other things. By this logic, we can then state it as the thing or one of the others and the thing or the second other.
-
-Either q and r or p must be true.
diff --git a/DisturbanceErrors.md b/DisturbanceErrors.md
@@ -1,7 +0,0 @@
-# Disturbance Errors
-
-## Notes
-
-Also referred to as [[DRAMRowHammer.md]]
-
-These are caused by frequent accesses of a given row. When a row is moved to the [[RowBuffer.md]] there is a (precharge) high charge applied to it and a low charge applied to the one being moved out of the buffer. This activation over and over to the same row can cause errors in adjacent rows because of how close together dram rows are. This increases the rate of charge leakage in adjacent rows. This issue has been resolved in flash by a controller that stores error correcting codes and checks over and over. There are still issues with this memory, but ecc resolves this issue when needed just more expensive. 
diff --git a/Div.md b/Div.md
@@ -1,13 +0,0 @@
-# Div
-
-U 2.4
-
-## Notes
-
-**Definition:** Div is a mathmatical function whereby we find the largest integer such that the second number times divisor is less than or equal to the first number. 
-
-
-ex:
-
-
-15 div 2 = 7
diff --git a/DivideAndConquer.md b/DivideAndConquer.md
@@ -1,15 +0,0 @@
-# Divide And Conquer
-
-CLRS 2.3.1
-
-## Notes
-
-**Definition:** Divide and conquer algorithms are algorithms that break a problem down into smaller sub-problems and then solve each subproblem.
-
-This algorithms are often, but not always, recursive.
-
-Steps:
-
-1. (Divide) Divide problem into subproblems
-2. (Conquer) Solve subproblems
-3. (Combine) Aggregate final result
diff --git a/DivisionRule.md b/DivisionRule.md
@@ -1,9 +0,0 @@
-# Division Rule
-
-Ch 6.1
-
-## Notes
-
-**Definition:** The division rule is a rule that describes the total size of the outcome space of some function.
-
-A good way to think of this is as a function. Consider the function f A -> B where there are d values such that f(a) = b for some b in B. Knowing this, there are |A|/d total possible outcomes.
diff --git a/DotProduct.md b/DotProduct.md
@@ -1,30 +0,0 @@
-# Dot Product
-
-CS331 + Khan 
-
-## Notes
-
-**Definition:** The dot product of two vectors is the sum of their corresponding components. 
-
-This can be visualized as the length of one vector, v, projected onto another vector, y, multiplied by the length of the vector y. Additionally, if two vectors generally have a different direction, their dot product is negative. This is why the on same side of plane algorithm works (see cs331 code), because if two vectors are on the same side of the normal vector of a plane, then they will both have negative or positive dot products.  
-
-This value is zero if the vectors are orthogonal. 
-
-### Additional Thoughts
-
-The dotproduct in a geometric sense is u dot v = ||u|| ||v|| cos(theta) where theta is the angle between the two vectors. As such, when the angle between them is greater than 90 and less than 270 we find the dot product is negative.
-
-
-### Intuition Of DP
-
-By definition, the dot product can be stated as follows where || defines lenghts of vectors and theta is the angle between the two vectors:
-
-a dot b = ||a|| ||b|| cos(theta)
-
-The dot product is the projection of the second vector onto the first vector multiplied by the length of the first vector where the first vector is the one that has a lesser angle. 
-
-Based on this information, it is intuitive that when two vectors are almost orthogonal the dot product is very small. Conversely, when two vectors are almost dependent, the dot product will then be large. 
-
-This explains why orthogonal vectors have a dot product of zero.
-
-**Product of the lengths of the vectors that are moving in the same direction.**
diff --git a/DoublyLinkedList.md b/DoublyLinkedList.md
@@ -1,8 +0,0 @@
-# Doubly Linked List
-
-CS 221 W 11 Lecture 13. 
-
-## Notes
-
-**Definition:** This is a linked list that has a pointer to the tail and head that are accessible, and every element in the list has a pointer to the previous and next nodes. 
-
diff --git a/Dropout.md b/Dropout.md
@@ -1,13 +0,0 @@
-# Dropout
-
-ML P604
-
-## Notes
-
-**Definition:** Dropout is a regularization technique for deep neural networks where upon each pass every neuron has a constant probability of being 'dropped out' meaning the output is 0.
-
-This works very well with a rate somewhere between 10%-50%. With RNNs we often do 20%-30% and with CNNs we use 40%-50%.
-
-This form of regularization ensures the model has multiple neurons that perform similar functions instead of being dependent upon one or only a few neurons to do important things.
-
-When using dropout we never drop output neurons. Additionally, it is common to only dropout neurons from the first few layers.
diff --git a/Duality.md b/Duality.md
@@ -1,9 +0,0 @@
-# Duality
-
-3B1B
-
-## Notes
-
-**Definition:** Duality is a natural but surprising correspondence between two types of things.
-
-This is like finding out that the dot product can be used to find the projection of a vector onto another vector.
diff --git a/DynamicProgramming.md b/DynamicProgramming.md
@@ -1,12 +0,0 @@
-# Dynamic Programming
-
-L3
-
-## Notes
-
-**Definition:** Dynamic programming is the idea that we can break down a problem into subproblems, solve those subproblems, and then use the results to find the problem's overall solution.
-
-There are two necessary conditions for a problem to be solvable via DP:
-
-1. [OptimalSubstructure](OptimalSubstructure.md)
-2. [OverlappingSubproblems](OverlappingSubproblems.md)
diff --git a/EarlyStopping.md b/EarlyStopping.md
@@ -1,11 +0,0 @@
-# Early Stopping
-
-ML D3
-
-## Notes
-
-**Definition:** Early stopping is the process of stopping a model early in training (assuming it uses GD or something akin to that) as a form of regularization.
-
-Early stopping decreases overfitting by stopping once a certain prediction error threshold is met. This also reduces time to train.
-
-Using sklearn, we can use partial_fit along with an epoch (pass) counter and loss calculation to determine if we are close enough to some goal to stop.
diff --git a/EigenVector.md b/EigenVector.md
@@ -1,81 +0,0 @@
-# Eigen Vector
-
-Self Study
-
-## Notes
-
-**Definition:** An Eigen Vector is a non-zero vector that when a linear transformation is performed upon it, the resulting vector is only moved by a scalar multiple (remains on the same line). 
-
-Associated with this, we also have an Eigen value which is the amount that a point on the Eigen Vector is distorted by (multiplied by this scalar)
-
-This can be thought of as the axis of rotation when in R^3. Additionally, we know the eigen value is 1 in a rotation as there is no stretching.
-
-Formula:
-
-Where T(x) is a L.T., A is L.T.'s matrix, v is a vector, lambda is a scalar
-
-T(v) = Av = lambda v
-
-There are eigen vectors iff det(lambda I_n - A) = 0
-
-## Calculation
-
-A = [1 2]
-    [4 3]
-
-Eigen Value Calculation:
-
-det(lambda [1 0] - [1 2]) = 0
-           [0 1]   [4 3]
-
-det([lambda 0] - [1 2]) = 0
-    [0 lambda]   [4 3]
-
-det([lamda-1  -2]) = 0
-	[-4 lambda-3]
-
-(lambda-1) (lambda-3) - 8 = 0
-
-lambda^2 - 4lambda - 5 = 0
-
-(lambda-5)(lambda + 1) = 0
-
-Solutions:
-
-lambda = 5 or lambda = -1
-
-(lambda = eigen value)
-
-Eigen Vector Calculation (calculating for 5):
-
-0 = (lambda I_n - A) v
- 
-= ([5 0] - [1 2] ) v
-   [0 5]   [4 3]
-
-= [4 -2] v
-  [-4 2]
-
-Null space calculation:
-
-[4 -2]
-[-4 2]
-
-[4 -2]
-[0  0]
-
-[1 -1/2][v_1] = [0]
-[0    0][v_2]	[0]
-
-v_1 - 1/2v_2 = 0
-
-v_1 = 1/2 v_2
-
-E_5 + {[v_1] = t[1/2], t \in \R}
-       [v_2]    [1  ]
-
-Answer:
-span([1/2])
-	 [  1]
-
-The other calculation would be the same but for -1 thus it will be left as an exercise for the reader.
diff --git a/ElasticNetRegression.md b/ElasticNetRegression.md
@@ -1,9 +0,0 @@
-# Elastic Net Regression
-
-ML D3
-
-## Notes
-
-**Definition:** Elastic net regression is another form of linear regression that adds a regularization term to the loss function which is a middle ground between ridge and lasso regression.
-
-As it relates to linear regression, it is good to add some regularization and when we know some coefficients should be 0 we should rely upon elastic regression. Otherwise ridge regression is a good option when we don't think there are useless features.
diff --git a/ElementaryTransformations.md b/ElementaryTransformations.md
@@ -1,9 +0,0 @@
-# Elementary Transformations
-
-Ch 2.2
-
-## Notes
-
-**Definition:** Elementary transformations are transformations done to matricies that do not change the validity of the system of equations.
-
-These elementary transformations are what we use to solve systems of equations via gaussian elimination.
diff --git a/EligibilityTraces.md b/EligibilityTraces.md
@@ -1,9 +0,0 @@
-# Eligibility Traces
-
-L4
-
-## Notes
-
-**Definition:** Eligibility traces combine both the frequency and recency heuristics to solve the credit assignment problem.
-
-Basically, every time we visit a state we increase the eligibility trace for the given state and over time this decays off. Higher values means the state is more associated with the outcome and lower means less. This allows us to both care about frequency because each visit adds to the trace, and care about recency because of decay.
diff --git a/Embedding.md b/Embedding.md
@@ -1,11 +0,0 @@
-# Embedding
-
-ML P722
-
-## Notes
-
-**Definition:** Embeddings are a high dimensional dense representation of data.
-
-When using one hot encoding we get a sparse output with only one 1 and the rest 0s. However, when using embeddings all representations are high dimensional and don't have sparsity. 
-
-Embeddings are generally trainable so while they are initialized, over time they will become more representative of the underlying data and how it relates to other embeddings.
diff --git a/EmptyGraph.md b/EmptyGraph.md
@@ -1,7 +0,0 @@
-# Empty Graph 
-
-Ch 4
-
-## Notes
-
-**Definition:** The empty graph is a graph that does not have any nodes and subsequently does not have any edges. 
diff --git a/Ensembles.md b/Ensembles.md
@@ -1,7 +0,0 @@
-# Ensembles
-
-CH2
-
-## Notes
-
-**Definition:** Ensembles are models composed of multiple models. These models can be the same like with random forests or different models put together.
diff --git a/Entropy.md b/Entropy.md
@@ -1,15 +0,0 @@
-# Entropy
-
-Ch 6
-
-## Notes
-
-**Definition:** Entropy is the average number of bits communicated by one message if message hoarding is allowed.
-
-Entropy of a finite set of messages is denoted as H(X).
-
-Formula:
-
-$H(X) = -\sum_{x\in S} P(x)log_2(P(x))$
-
-X is an ordered pair (S,P) where S is a finite set of messages and P is a function from S -> [0,1] S.T. P(x) = Probability of x for all x in S.
diff --git a/Episode.md b/Episode.md
@@ -1,7 +0,0 @@
-# Episode
-
-L4
-
-## Notes
-
-**Definition:** In episode in RL is a given evaluation of a policy from start to finish.
diff --git a/Episodic.md b/Episodic.md
@@ -1,7 +0,0 @@
-# Episodic
-
-L4
-
-## Notes
-
-**Definition:** Episodic, with resepect to RL, means that there are episodes as opposed to non-episodic which means something continues on forever.
diff --git a/EquationOfAPlane.md b/EquationOfAPlane.md
@@ -1,43 +0,0 @@
-# Equation of a Plane
-
-Khan
-
-## Notes
-
-**Definition:** The equation of a plane is the equation that defines all points on the plane as a combination of n variables where n is the number of dimensions we are in. This is the definition of plane when in 3d space and a hyperplane in higher dimensions. 
-
-### Plane Formula
-
-The general formula for a 3d plane is as follows:
-
-ax + by + cz = d
-
-where a,b,c, and d are coeficcients and x,y, and z are variables.
-
-### Hyperplane formula
-
-A hyperplane in n dimensional space is defined as follows:
-
-a_1 x_1 + a_2 x_2 + ... + a_n x_n = d
-
-Where, again, all a sub values are coeficcients along with d and all x values are variables.
-
-### Calculation
-
-When we have a normal vector and a point this is very simple.
-
-Steps:
-
-1. Plug in normal vector for equation of the plane:
-	- ax + by + cz = d where a,b,c are the x,y, and z axis components of the vector
-2. Plug in representative point as x,y, and z and then solve for d 
-
-This can be extrapolated into higher dimensions for hyper-planes assuming we still have a representative point and the normal vector.
-
-### No Normal Vector
-
-To find the normal vector of a plane when we just have three points or two vectors and one point  we can find the normal vector as follows:
-
-1. If we have three points then find two vectors on the plane by taking the difference between a reference point and the two other points on the plane. If these vectors are colinear (dependent) then we don't have enough information to get the formula for the plane.
-2. Take the cross product of both vectors to find the normal vector
-3. Complete calculation steps above
diff --git a/EquivalenceClass.md b/EquivalenceClass.md
@@ -1,9 +0,0 @@
-# Equivalence Class
-
-AM W14 Video
-
-## Notes
-
-**Definition:** An equivalence class is a subset of a set relation that describes a given output and all elements of the sort denoted by [X].
-
-A simple example of this is the mod 5 relation. There are only five values that can result from mod 5 for the set $\Z$ being 0,1,2,3, and 4. Each of these can be defined as its own equivalence class denoted by [0], [1], [2], [3], and [4]. For a class to be an equivalence class, the relation must be an equivalence relation. Additionally we know [0] = [5] = [10] and no two sets are equivalent to each other. The final interesting thing is that the intersection between any of these sets is always the null set. 
diff --git a/EquivalenceRelation.md b/EquivalenceRelation.md
@@ -1,7 +0,0 @@
-# Equivalence Relation
-
-Ch 9.5
-
-## Notes
-
-**Definition:** An equivalence relation is a relation that is reflexive (xRx), symmetric (xRy -> yRx), and transitive (xRy and yRz -> xRz).
diff --git a/EuclideanAlgorithm.md b/EuclideanAlgorithm.md
@@ -1,9 +0,0 @@
-# Euclidean Algorithm
-
-Ch 2.4
-
-## Notes
-
-**Definition:** The Euclidean algorithm is an algorithm used to determine the greatest common factor of two numbers.
-
-This is done as an alternative to the prime factoziation method which is too slow.
diff --git a/Evaluation.md b/Evaluation.md
@@ -1,7 +0,0 @@
-# Evaluation
-
-L1
-
-## Notes
-
-**Definition:** Evaluation in RL is the process of seeing how good a policy is.
diff --git a/Event.md b/Event.md
@@ -1,9 +0,0 @@
-# Event
-
-CH 1.2
-
-## Notes
-
-**Definition:** An event is a subset of the sample space.
-
-The frequency of the event A is denoted $\mathbf{N}(A)$.
diff --git a/EvolutionaryMethods.md b/EvolutionaryMethods.md
@@ -1,7 +0,0 @@
-# Evolutionary Methods
-
-RL Ch 1
-
-## Notes
-
-**Definition:** Evolutionary methods are a class of RL strategies where learning is not done by interacting with the environment but rather by updating policies using a strategy akin to evolution where the best models continue on.
diff --git a/ExhaustiveProof.md b/ExhaustiveProof.md
@@ -1,9 +0,0 @@
-# Exhaustive Proof
-
-U 1.8.2
-
-## Notes
-
-**Definition:** An exhaustive proof is similar to proof by cases except we evaluate it for all specific examples which needs to be a relatively small number.
-
-An exhaustive proof that "a < a+1 for 1 < a < 5 where a $\in \Z$" would show the statement is true for 2, 3, and 4.
diff --git a/Expectation.md b/Expectation.md
@@ -1,17 +0,0 @@
-# Expectation (Expected Value of Random Variable)
-
-L6
-
-## Notes
-
-**Definition:** The expected value of a PMF is the weighted average of output.
-
-This is calculated by summing the probabilities of each output multiplied by the output value. This will be the 'middle' of the sample space (weighted average).
-
-This is denoted by the function E[] where the inside is the random variable that is being predicted upon. 
-
-Interesting note, the signed difference from the expectation will always be zero hence why we square them to find the variance and subsequently the standard deviation.
-
-## Conditional (L6)
-
-Conditional expectations are just expectations, but they are in reference to the conditional PMF instead of the original one.
diff --git a/ExplodingGradients.md b/ExplodingGradients.md
@@ -1,17 +0,0 @@
-# Exploding Gradients
-
-ML 550
-
-## Notes
-
-**Definition:** Exploding gradients is a problem with training neural networks where lower levels have very high gradients and thus the gradient steps diverge from a proper solution.
-
-This is the opposite of [[VanishingGradients.md]]
-
-This often occurs for recurrent neural networks. 
-
-### Solutions
-
-Use ReLU and better weight initialization (not gaussian distribution with std deviation of 1).
-
-See [[UnstableGradients.md]] for more.
diff --git a/Exploit.md b/Exploit.md
@@ -1,9 +0,0 @@
-# Exploit
-
-RL Ch 1
-
-## Notes
-
-**Definition:** To exploit in RL means to take the known best move in the current state.
-
-This is the opposite of explore which is to take a random move and see how that plays out in the future in case it may be better than the current best known option.
diff --git a/ExploratoryDataAnalysis.md b/ExploratoryDataAnalysis.md
@@ -1,7 +0,0 @@
-# Exploratory Data Analysis (EDA)
-
-Stats D3
-
-## Notes
-
-**Definition:** Exploratory data analysis is the process of exploring a dataset to find patterns and to create models/statistics/visualizations.
diff --git a/Explore.md b/Explore.md
@@ -1,7 +0,0 @@
-# Explore
-
-RL Ch 1
-
-## Notes
-
-**Definition:** To explore in RL means to select an option that is either unknown or suboptimal and then continuing the evaluate that path with the hope it may lead to a better outcome than the known best option.
diff --git a/ExponentialDistribution.md b/ExponentialDistribution.md
@@ -1,9 +0,0 @@
-# Exponential Distribution
-
-Stats D1
-
-## Notes
-
-**Definition:** An exponential distribution is one that is decreasing at a decreasing pace. Specifically, it can be stated in some form of lambda^-x where there may be constants or other things involved, but we find that as x increases, y decreases at a decreasing rate. 
-
-This is often used to show the probability of time between random things happening which is similar in some ways to [[PoissonDistribution.md]].
diff --git a/ExtraTrees.md b/ExtraTrees.md
@@ -1,9 +0,0 @@
-# Extra Trees (Extremely randomized trees)
-
-ML D5
-
-## Notes
-
-**Definition:** Extra trees are decisions trees that incorporate extra randomness by randomizing splitting thresholds instead of using gini impurity of information gain to determine splitting thresholds.
-
-Basically, each leaf selects a random feature and then selects a random value that is in the set of valid inputs for the node and splits upon that. This adds lots of randomness and greatly reduces training time because the optimal split at each point in time does not need to be calculated. 
diff --git a/Feature.md b/Feature.md
@@ -1,11 +0,0 @@
-# Feature
-
-ML CH1
-
-## Notes
-
-**Definition:** A feature is a ml term used to describe either an individual feature of a sample or a given feature of all samples. 
-
-Example of one sample: The fuel economy feature of the toyota carolla is very high.
-
-Example of all samples: The fuel economy feature seems to be related to the rate of breakdown for cars.
diff --git a/FeatureScaling.md b/FeatureScaling.md
@@ -1,11 +0,0 @@
-# Feature Scaling
-
-ML CH2
-
-## Notes
-
-**Definition:** Feature scaling is the process of changing input features to be scaled in a similar way. 
-
-Feature scaling is important because machine learning algorithms don't do well when you have lots of vectors that use vastly different scales of values.
-
-There are two types of feature scaling namely [[MinMaxScaling.md]] and [[Standardization.md]]
diff --git a/FibonacciNumbers.md b/FibonacciNumbers.md
@@ -1,9 +0,0 @@
-# Fibonacci Numbers
-
-Abstract Math 10.5. 
-
-## Notes
-
-**Definition:** The set of numbers in the form $F_n = F_{n-1} + F_{n-2}$ starting from 1 as the first value. 
-
-
diff --git a/FiniteDimensional.md b/FiniteDimensional.md
@@ -1,11 +0,0 @@
-# Finite Dimensional
-
-**Source:** Linear Algebra Done Right
-
-**Chapter:** 2
-
-## Notes
-
-**Definition:** A vector space is finite dimensional if it contains a list of vectors that span the space.
-
-Finite dimensional is antithetical to infinite dimensional which is a vector space that does not contain a list of vectors that span the entire space. This can occur when we have a vector space that has an infinite number of coordinates, but since lists must be finite, we can't define a list of vectors that spans the entire space.
diff --git a/FiniteField.md b/FiniteField.md
@@ -1,7 +0,0 @@
-# Finite Field
-
-Ch 5
-
-## Notes
-
-**Definition:** A finite field in abstract algebra is a set where addition, subtraction, multiplication, and division are defined and behave in a way similar to real numbers (field) that also contains a finite number of elements.
diff --git a/FiniteStateAutomata.md b/FiniteStateAutomata.md
@@ -1,9 +0,0 @@
-# Finite State Automata (FA) 
-
-**Source:** Theory Of Computation
-
-**Lecture:** 2
-
-## Notes
-
-**Definition:**
diff --git a/FisherYatesShuffle.md b/FisherYatesShuffle.md
@@ -1,27 +0,0 @@
-# Fisher Yates Shuffle
-
-## Notes
-
-**Definition:** The Fisher-Yates sorting algorithm is the most common sorting algorithm whereby you iterate backwards through the list swapping the current index with an arbitrary index that is less than the current until reaching the 0th index.
-
-Implementation:
-
-```python
-
-def swap(ls, pos1, pos2):
-    temp = ls[pos1]
-    ls[pos1] = ls[pos2]
-    ls[pos2] = temp
-    return ls
-
-def shuffle(ls):
-    i = len(ls) - 1
-    # 0 does not need to swap
-    while i > 0:
-        rnd = random.randint(0,i-1)
-        swap(ls,pos1=rnd, pos2=i)
-        i -= 1
-    
-    return ls
-
-```
diff --git a/FlashCrash.md b/FlashCrash.md
@@ -1,7 +0,0 @@
-# Flash Crash (2010)
-
-Superintelligence - Bostrom
-
-## Notes
-
-**Definition:** The flash crash occurred in 2010 where two oppositional algorithmic traders began trading back and forth very quickly because their utility function specified they do so, but logically they should not have.
diff --git a/Floor.md b/Floor.md
@@ -1,11 +0,0 @@
-# Floor
-
-U2.3.4
-
-## Notes
-
-**Definition:** The floor function specifies to round down the input to the nearest integer. 
-
-Remember to round to the lower number for negative numbers.
-
-$\lfloor 10.9 \rfloor = 10$
diff --git a/Folding.md b/Folding.md
@@ -1,7 +0,0 @@
-# Folding
-
-Ch 5
-
-## Notes
-
-**Definition:** Folding is a process used in a hashing function where we split the key into discrete parts and then operate upon each of them seperately. 
diff --git a/ForwardThoughts.md b/ForwardThoughts.md
@@ -1,23 +0,0 @@
-# Forward Thoughts
-
-Things that could be possible and necessary for future development
-
-## Notes
-
-There will need to be architecture capable of allowing higher levels of computation. We need to consider future scaling. 
-
-In the future architecture may need to enable:
-
-1. Life like 3D visualization
-2. Virtual Reality
-3. Personalized Genomics/Medicine
-
-Solutions to [[DRAMRowHammer.md]].
-
-Alternatives to Von Neumann Model:
-
-1. Multi Processors
-	- Each processor is Von Neumann, but there is parallel processing which is not
-2. [[DataFlow.md]]
-3. [[BulkSynchronousProcessing.md]]
-	- This is the common way of doing synchronous processing while using Von Neumann constrained processors
diff --git a/FreeVariables.md b/FreeVariables.md
@@ -1,9 +0,0 @@
-# Free Variables
-
-Ch 2.2
-
-## Notes
-
-**Definition:** Free variables are variables in RREF that are not alone in their column.
-
-The existence of free variables means there are infinitely many solutions to a system of equations.
diff --git a/Frequency.md b/Frequency.md
@@ -1,9 +0,0 @@
-# Frequency
-
-Ch 1.1
-
-## Notes
-
-**Definition:** Frequency describes the number of occurences of a given outcome from the trials of a random experiment.
-
-Frequency is often confused with [[RelativeFrequency.md]] and [[Probability.md]] but they are different terms as the others desribe relative likelihood of an event.
diff --git a/FrequencyHeuristic.md b/FrequencyHeuristic.md
@@ -1,11 +0,0 @@
-# Frequency Heuristic
-
-L4
-
-## Notes
-
-**Definition:** The frequency heuristic is the idea that we assign credit based on how frequently things happen.
-
-In RL if we are to see 4 bells, 1 light, and get a negative reward, then by the frequency heuristic we could state the 4 bells caused the negative reward. 
-
-This is a solution to the credit assigment problem.
diff --git a/FunctionNotation.md b/FunctionNotation.md
@@ -1,7 +0,0 @@
-# Function Notation
-
-Ch 0
-
-## Notes
-
-**Definition:** Function notation is using formal math logic such as f(x) : X -> Y to define tasks.
diff --git a/FundamentalOperations.md b/FundamentalOperations.md
@@ -1,15 +0,0 @@
-# Fundamental Operations
-
-L1
-
-## Notes
-
-**Definition:** Fundamental operations are operations that take constant time.
-
-#### List of fundamental operations
-
-Word-RAM - Accessing arbitrary memory address (smallest access size is a word, 64-bit)
-
-Memory Allocation/Deallocation - Allocate one word (linear time to make array of n length)
-
-Simple Arithemitic - Add, subtract, multiply, divide, remainder, floor, ceiling)
diff --git a/FundamentalTheoremOfArithmetic.md b/FundamentalTheoremOfArithmetic.md
@@ -1,13 +0,0 @@
-# The Fundamental Theorem of Arithmetic
-
-Abstract Math 10.4. Can be proven through [[StrongInduction.md]]
-
-## Notes
-
-**Definition:** Any integer greater than 1 has a unique prime factorization. 
-
-This means any number can be given in the form a * b * ... * z where all numbers multiplied together are prime numbers.
-
-The proof of this is quite interesting. Basically it states, if a number is not the prime factorization of any other numbers, then it can't be made because all numbers can be made by multiplying primes together thus this number must be prime meaning it factorizes itself. 
-
-
diff --git a/FundamentalTheroemofCalculus.md b/FundamentalTheroemofCalculus.md
@@ -1,9 +0,0 @@
-# (Second) Fundamental Theroem of Calculus
-
-Khan U1
-
-## Notes
-
-**Definition:** The (second) fundamental theroem of calculus states that the derivative of the integral of a function from a (constant) to x that is continuous is equivalent to the contained function with respect to x.
-
-This implies that all functions that are continuous on a domain have an antiderivative.
diff --git a/GCD.md b/GCD.md
@@ -1,9 +0,0 @@
-# GCD
-
-U 2.4
-
-## Notes
-
-**Definition:** The GCD of two numbers a and c is the largest integer such that a | b and a | c.
-
-To find the GCD of two numbers find the prime factorization and then take the min exponent of each prior prime. Evaluate this to find the GCD.
diff --git a/GameLoop.md b/GameLoop.md
@@ -1,13 +0,0 @@
-# Unity Game Loop
-
-CS 331 W12 L3
-
-## Notes
-
-**Definition:** Each frame the loop function of each script is called. 
-
-This is the same idea as animation which is giving motion to still images. 
-
-When using the game loop you can call Time.deltaTime to get the seconds between the previous and current frame as a float. This can be used to achieve frame rate independence. 
-
-See [[MonoBehaviour.md]] for more about update method.
diff --git a/GameObject.md b/GameObject.md
@@ -1,17 +0,0 @@
-# Game Object
-
-CS 331 W12 L3
-
-## Notes
-
-**Definition:** This is the data type of objects in the game. This is a broad class that has some built in functionallity. 
-
-A common way to move an object forward using it's [[Vector3.md]] is as follows:
-```csharp
-	float speed = 2; //default forward speed
-	bool moveForward = Input.GetKey("up");
-	if(moveForward){
-		transform.position += transform.forward * speed * Time.deltaTime();
-	}
-
-```
diff --git a/GaussianElimination.md b/GaussianElimination.md
@@ -1,11 +0,0 @@
-# Gaussian Elimination (row reduction)
-
-Khan U1
-
-## Notes
-
-**Definition:** Gaussian elimination is the process of simplifying a system of equations to [[ReducedRowEchelonForm.md]] to solve the system.
-
-Basically, we perform row operations on an augmented matrix to get RREF. We then find the values of the x, y, and z components and that is our solution.
-
-See also [[CramersRule.md]]
diff --git a/GaussianIntegers.md b/GaussianIntegers.md
@@ -1,7 +0,0 @@
-# Gaussian Integers
-
-AM W13 L1
-
-## Notes
-
-**Definition:** This is the set of all numbers of the form a + bi such that a and b are integers and i^2 is -1. 
diff --git a/GaussianMixtureModels.md b/GaussianMixtureModels.md
@@ -1,7 +0,0 @@
-# Gaussian Mixture Models
-
-ML D5
-
-## Notes
-
-**Definition:** Gaussian mixture models (GMMs) are probabilistic models that assume instances were generated using several gaussian distributions where each distribution forms its own cluster.
diff --git a/GeneralSolution.md b/GeneralSolution.md
@@ -1,7 +0,0 @@
-# General Solution
-
-Ch 2.2
-
-## Notes
-
-**Definition:** A general solution to a system of linear equations is one that describes all possible solutions as combinations of each other.
diff --git a/GeneralizationError.md b/GeneralizationError.md
@@ -1,11 +0,0 @@
-# Generalization Error
-
-ML CH1
-
-## Notes
-
-**Definition:** Generalization error or out-of-sample error, is the error rate of a model on data that is not in the training set. 
-
-When testing a model it is important to have a training set and a test set which is a certain amount of the total number of samples. You then train the model and check to see its accuracy on the test set. This accuracy is the generalization error rate.
-
-It is common practice to use 80% of the data for training and 20% for testing. There is also sometimes another set of data called the holdout set which is compared against to give another layer of verification. This is important because sometimes models will be tuned using different hyperparameters (learning rates) and then they may be better for the 20% of testing data, but by doing this you basically tuned the model to be the best for both the training and testing set so it is useful to have one more set in these cases. This is also sometimes referred to as the validation set, dev set, or development set. In this case you would first train on training data, test them all against the dev set, select the best one, and then evaluate on the test set for generalization error.
diff --git a/GeneralizedPigeonholePrinciple.md b/GeneralizedPigeonholePrinciple.md
@@ -1,7 +0,0 @@
-# Generalized Pigeonhole Principle
-
-Ch 6.2
-
-## Notes
-
-**Definition:** The generalized pigeonhole principle is \ceil{N/k} where N is the number of elements and k the number of groups. This gives us the maximally filled group given equitable distribution.
diff --git a/GradientBoosting.md b/GradientBoosting.md
@@ -1,10 +0,0 @@
-
-ML D5
-
-## Notes
-
-**Definition:** Gradient boosting sequentially adds predictors to an ensemble and fits subsequent models not by instance weights like adaboosting but by residual errors.
-
-Residual errors are simply the difference between expected and predicted values. As such, gradient boosting does not use weighting in the same way as adaboosting thus distinguishing the two. It basically tries to predict the error amounts from the prior model and output what it thinks they will be.
-
-Gradient boosting generally uses stronger learners than adaboosting as this works better with the architecture.
diff --git a/GradientClipping.md b/GradientClipping.md
@@ -1,13 +0,0 @@
-# Gradient Clipping
-
-ML P569
-
-## Notes
-
-**Definition:** Gradient clipping is the process of clipping gradients during backpropogration so they never exceed some threshold.
-
-This is another technique used to resolve issues relating to [[ExplodingGradients.md]] particularly for RNNs where batch normalization does not work.
-
-There are two ways to do gradient clipping either with a threshold cut off or with vector scaling. With vector scaling we retain the direction of the vector and set the minimize the largest value to 1 (if greater than 1) while scaling all other features proprotionally. More commonly, we simply truncate values so if we have [100, .1] with a threshold of (-1,1) we would then scale the vector to [1, .1].
-
-Scaling the entire vector is called normalization.
diff --git a/GradientDescent.md b/GradientDescent.md
@@ -1,21 +0,0 @@
-# Gradient Descent 
-
-ML L2
-
-## Notes
-
-**Definition:** Gradient Descent is an algorithm used to find a 'near' optimal approach to the given problem. This is used with [[LinearRegression.md]] to optimize the function by selecting a set of parameters $\theta$ and then repeatedly finding the direction that results in the fastest movement towards a cost function's value nearest to 0. This will find a local optimum. With linear regression however there will not be local optimum but only global.
-
-General idea is to start with some $\theta$ (parameters) and keep changing it to reduce J($\theta$). (Find J in [[LinearRegression.md]])
-
-More specifically, you pick a starting point, see what direction you should go to get closer to 0 the fastest. You then repeat this algorithm. It's not perfect, but it's fast.
-
-This is a common algorithm used for [[LinearRegression.md]] when there are lots of features or lots of samples (too big for memory) which would cause the formula for linear regression to be too slow.
-
-For a simple implementation of gradient descent using a [[LearningRate.md]] for third degree polynomials see [[GradientDescentCode.md]].
-
-When using gradient descent for linear regression one must calculate the partial derivative for each variable and then determine if it is positive or negative and move in the correct direction. 
-
-Another thing, batch gradient descent is calculating the descents based on all of the samples given. An alternative to this is stochastic gradient descent which is much lighter and faster because it only tries to get closer at each step to a random point in the dataset. This allows for out-of-core learning where the entire dataset does not to be loaded in memory at any given time.
-
-Another type of gradient descent is mini-batch gradient descent which stands by batch and stochastic gradient descent. This form of GD uses smalll batches of random sets and then performs descent upon them. This is basically stochastic, but with a few more samples each time instead of a single random sample.
diff --git a/GradientDescentCode.md b/GradientDescentCode.md
@@ -1,58 +0,0 @@
-# Gradient Descent Implementation 
-
-This approach implements a [[LearningRate.md]] parameter to narrow in upon a local minimum of the given third degree polynomial.
-
-## Code
-```python
-import sys
-import random
-
-RECURSION_LIMIT = 1500
-
-
-sys.setrecursionlimit(RECURSION_LIMIT)
-print("ax^3 + bx^2 + cx + d")
-a = float(input("Enter a: "))
-b = float(input("Enter b: "))
-c = float(input("Enter c: "))
-d = float(input("Enter d: "))
-learningRate = float(input("Learning Rate: "))
-
-
-def calculateValue(x):
-    return x**3 * a + x**2 * b + x * c + d
-
-
-def printResult(x, y):
-    print("x: " + str(x) + "\ty: " + str(y))
-
-
-def limit(x):
-    rightX = x + .00000001
-    leftX = x - .00000001
-
-    rightY = calculateValue(rightX)
-    leftY = calculateValue(leftX)
-
-    return ((rightY - leftY) / .00000002)
-
-
-def descend(x, depth):
-    # Need - 15 because recursion includes other function calls...
-    if depth >= RECURSION_LIMIT - 15:
-        return x
-    lim = limit(x)
-    printResult(x, calculateValue(x))
-    depth += 1
-
-    if lim > 0:
-        return descend(x - learningRate * lim, depth)
-    else:
-        return descend(x + learningRate * lim, depth)
-
-
-currSearch = random.random() * 10
-xVal = descend(currSearch, 0)
-
-printResult(xVal, calculateValue(xVal))
-```
diff --git a/GramSchmidtProcess.md b/GramSchmidtProcess.md
@@ -1,9 +0,0 @@
-# The Gram-Schmidt Process
-
-Khan U3
-
-## Notes
-
-**Definition:** The Gram-Schmidt process is a process for finding an orthonormal basis of a subspace. 
-
-Basically, if we have a basis we can find the orthonormal basis of the subspace by normalizing the first vector, projecting a subsequent one onto it and subtracting that from the original vector, normalizing that new vector, and repeating each time projecting onto all existing basis'.
diff --git a/Graphs.md b/Graphs.md
@@ -1,11 +0,0 @@
-# Graphs
-
-Abstract Math 10.2. 
-
-## Notes
-
-**Definition:** A graph is a configuration consisting of vertices and edges. 
-
-**Cycle:** A cycle in a graph is a set of vertices such that traversal can be done back to itself.
-
-Also see [[Tree.md]]
diff --git a/HHP102.md b/HHP102.md
@@ -1,8 +0,0 @@
-# Health And Wellness 
-
-Summer 24
-
-## Main Links
-
-- [TransTheoreticalModel](TransTheoreticalModel.md)
-- [SMART](SMART.md)
diff --git a/HadamardProduct.md b/HadamardProduct.md
@@ -1,9 +0,0 @@
-# Hadamard Product
-
-Ch 2.2
-
-## Notes
-
-**Definition:** The Hadamard product of two matricies (assuming they are the same size) is an index based multiplication of each element of both matricies.
-
-This product is used with CNNs because the kernel applies a Hadamard product to the underlying masked portion of the matrix with respect to the kernel.
diff --git a/HalfWord.md b/HalfWord.md
@@ -1,7 +0,0 @@
-# Half Word
-
-W1
-
-## Notes
-
-**Definition:** This is half the size of a CPU's word.
diff --git a/Hamming.md b/Hamming.md
@@ -1,11 +0,0 @@
-# Hamming
-
-He was a person who was influential to computing
-
-## Notes
-
-**Hamming Distance:** The difference between two strings. This is defined as the number of positions that are different.  
-
-Hamming distance led to the inception of error correction (hamming codes)
-
-**Hamming Codes:** :todo:
diff --git a/HarmonicMean.md b/HarmonicMean.md
@@ -1,19 +0,0 @@
-# Harmonic Mean
-
-ML D2
-
-## Notes
-
-**Definition:** The harmonic mean is a metric used to describe the accuracy of a model. This value is representative of the precision and recall of a model.
-
-Basically, this is a combination of [[Precision.md]] and recall
-
-The harmonic mean favors models with similarly good values for both recall and precision which can be good in certain cases. There are however many cases where precision, recall, or accuracy may be more important.
-
-
-Formula:
-
-
-F_1 = 2 * (p * r) / (p+r)
-
-Where p = [[Precision.md]] and r = recall
diff --git a/HashFunction.md b/HashFunction.md
@@ -1,40 +0,0 @@
-# Hash Function
-
-Ch. 5
-
-## Notes
-
-**Definition:** A hash function is a function f(k) that takes a key value k (x.r = k where x is an object) and outputs a natural number.
-
-f : K -> N where K is the set of all possible keys and N is the natural numbers. 
-
-Oftentimes we also want to state that the set of valid outputs (image of f) is from 0 to m-1 where m is the length of the array we are using to store the objects of the given hash value.
-
-### n-bit hash functions
-
-**Definition:** n-bit hash functions are hash functions such that their image is the natural numbers from 0 to 2^n-1.
-
-These hash function can then map to all permuations of bits with a length of n.
-
-### Important aspects of the hashing function
-
-1. Speed
-2. One-way
-3. Deterministic
-4. Uniform
-
-Speed - this means the time to compute the hash for the inputs is generally fast
-
-One-way - this means it is hard to go the opposite direction namely from N -> K
-
-Deterministic - this means for the same input the output should always be the same
-
-Uniform - the distribution, with respect to K, should be roughly uniform across the image of the function - [0, m-1]
-
-### Perfect
-
-A hash function is perfect if for all valid inputs, there are no collisions (one-to-one). 
-
-$f:K \to N$  is a perfect hash function if 
-$\forall x,y\in K, f(x) \neq f(y) \text{ where } y \neq x$
-
diff --git a/HashTable.md b/HashTable.md
@@ -1,9 +0,0 @@
-# Hash Table
-
-Ch 5
-
-## Notes
-
-**Definition:** A hash table is a collection data structure that allows insertions of elements and checking for elements that uses a hashing function to place objects into an array for 'constant' time access.
-
-Note that generally we use openaddressing (linear/quadratic probing), to ensure collisions are handled correctly. There is also the case where you might use a linked list, but in cs 303 we are not doing that, and python does not do that either.
diff --git a/HashValues.md b/HashValues.md
@@ -1,9 +0,0 @@
-# Hash Values (hash code)
-
-Ch 5
-
-## Notes
-
-**Definition:** A hash value is the output of the hash function that describes which index we should try to place an element in.
-
-Hash values can range from 0 - m-1 where m is the length of the array to store the element in.
diff --git a/Hashing.md b/Hashing.md
@@ -1,7 +0,0 @@
-# Hashing
-
-L4 - Ch5 (Rosen)
-
-## Notes
-
-**Definition:** Hashing is a process done whereby we use some function f(x) to map one value to another where the output value is generally an index or otherwise adressable place.
diff --git a/HasseDiagram.md b/HasseDiagram.md
@@ -1,9 +0,0 @@
-# Hasse Diagram
-
-Ch 9.6
-
-## Notes
-
-**Definition:** A hasse diagram is a way to show a (finite) poset in a graphical way. 
-
-To create a hasse diagram first we create a digraph of a relation. We then remove all loops and finally we remove directionallity s.t. all elements below the current that are directly connected must also be comparable.
diff --git a/HistogramBasedGradientBoosting.md b/HistogramBasedGradientBoosting.md
@@ -1,11 +0,0 @@
-# Histogram Based Gradient Boosting (HGB)
-
-ML D5
-
-## Notes
-
-**Definition:** Histogram based gradient boosting is an implementation of gradient boosting that uses binning of input features.
-
-This is much faster than normal gradient boosting. Also, the normal way of doing this is by rounding to integers for values.
-
-This is hundreds of times faster in training than gradient boosting on large datasets. At the cost of precision. With that said, binning also acts as a regularizer to help reduce overfitting.
diff --git a/HistoricalDesigns.md b/HistoricalDesigns.md
@@ -1,7 +0,0 @@
-# Historical Designs
-
-Discussion of designs used historically and things we can take away. 
-
-## Notes
-
-There is a trade off taken historically to use many cores instead of a single powerful core. It is much easier to architect simple cores that chain together than to architect one powerful core. This has a trade off in that it requires developers higher in the stack to ensure their code takes advantage of all of the cores using parallelization. 
diff --git a/Homogeneous.md b/Homogeneous.md
@@ -1,15 +0,0 @@
-# Homogeneous
-
-Khan U2
-
-## Notes
-
-**Definition:** In linear algebra a homogeneous solution is one where the right side of the system is the zero vector. 
-
-See also [[Inhomogeneous.md]]
-
-## CS
-
-Ch. 5 (Rosen)
-
-**Defintiion:** In computer science homogeneous means that all items in some collection are of the same datatype.
diff --git a/Hyperparameter.md b/Hyperparameter.md
@@ -1,9 +0,0 @@
-# Hyperparameter
-
-ML CH2
-
-## Notes
-
-**Definition:** A hyperparameter in ML is a parameter that is defined prior to training that is not influenced by samples.
-
-Examples of hyperparmeters are [[LearningRate.md]] and m in the case of calculating weighted means. More about this can be seen here [[TargetEncoding.md]]
diff --git a/Hyperplane.md b/Hyperplane.md
@@ -1,7 +0,0 @@
-# Hyperplane
-
-Khan U2
-
-## Notes
-
-**Definition:** A hyperplane is a 3-dimensional or higher subspace with dimensionality that is one less than the [[AmbientSpace.md]].
diff --git a/Hypervolume.md b/Hypervolume.md
@@ -1,7 +0,0 @@
-# Hypervolume
-
-Khan U2
-
-## Notes
-
-**Definition:** Hypervolume much like [[Hyperplane.md]] is volume in dimensions higher than 3.
diff --git a/IPD.md b/IPD.md
@@ -1,8 +0,0 @@
-
-CS 331 W16
-
-## Notes
-
-**Definition:** This is the distance between the pupils. 
-
-This value is important to calculate to ensure both images are rendered properly, think about parallax and how there could be issues.
diff --git a/IQR.md b/IQR.md
@@ -1,9 +0,0 @@
-# IQR (Inter Quartile Range)
-
-Khan
-
-## Notes
-
-**Definition:** The IQR is the difference between the 75th percentile and 25th percentile as a value.
-
-This is also called the midspread, fourth spread, or H-spread.
diff --git a/ISA.md b/ISA.md
@@ -1,52 +0,0 @@
-# Instruction Set Architecture
-
-Computer Architecture L(2,3)
-
-## Notes
-
-**Definition:** The design of the interconnection between hardware and software to create a functional computing system. 
-
-This is the agreed upon interface between os/vm/higher level things and lower level [[MicroArchitecture.md]]. This information is necessary to know for the OS developer.
-
-The ISA also includes register things and sometimes the CPU frequency/voltage.
-
-[[Pipelining.md]] is generally not part of the ISA on newer systems.
-
-Some ISAs have additional room for un-implemented instructions that would allow for future expansion. 
-
----
-
-0-address machines (stack machines) are machines that only take [[Opcode.md]] but not [[Operands.md]]. 0-address takes up less space in code, everything is already on the stack, but it can be very slow and can't express all computations easily (consider order of operations).  
-
-2-address machines are source + destination for operands. This does not preserve the value of the destination which requires copying overhead. x86 is 2-address.
-
-3-address machines are source 1, source 2, and destination. Alpha is 3-address as is MIPS and ARM.
-
----
-
-The ISA also defines the supported datatypes. Some common ones include int, float, character. Sometimes they can include linked lists, stacks, queues, and strings. 
-
-With more/high level datatypes in the ISA we have smaller code, more cpu complexity, but simpler compilers. This basically means harder for [[MicroArchitecture.md]] development, but easier for compiler developer. 
-
-This ties into semantic gap which describes the difference between the ISA and what programmers are trying to do with respect to datatypes and opcodes. When there are more datatypes, the semantic gap is low. The inverse is also true.
-
-Virtual memory support is also part of the ISA.
-
-
----
-
-There is also another division in ISAs being load/store vs memory/memory architecture. 
-
-Load/store allows instructions to only run on registers. These are most RISC ISAs including MIPS and ARM.
-
-Memory/memory can operate on memory locations as well as registers. These are most CISC ISAs including x86. 
-
----
-
-Orthogonal ISA
-
-Orthogonal ISAs allow for all opcodes to be used regardless of addressing mode. 
-
----
-
-
diff --git a/IdentityMatrix.md b/IdentityMatrix.md
@@ -1,20 +0,0 @@
-# Identity Matrix
-
-Khan Unit 2
-
-## Notes
-
-**Definition:** The identity matrix is the matrix in R^n such that any matrix in R^n multiplied by it is equal to itself. 
-
-This matrix can be stated as follows where each row has one '1':
-
-[1 0 0 ... 0]
-[0 1 0 ... 0]
-[. . . ... .]
-[. . . ... .]
-[. . . ... .]
-[0 0 0 ... 1]
-
-## Interesting Notes
-
-The columns of the identity matrix are called the **standard basis** of R^n because each vector is a unit vector, they are linearly independent, and can construct any vector in R^n.
diff --git a/Image.md b/Image.md
@@ -1,27 +0,0 @@
-# Image
-
-Khan U2
-
-## Notes
-
-**Definition:** The image of a function is the total set of all outputs of a given function (transformation for vectors).
-
-This is the same as [[Range.md]].
-
-Subsequently the preimage is the domain of the function with mappings to elements of the image.
-
-## Lin Alg Specific
-
-The result of the tranformation of a subspace is the image of the subspace under T where T is the transformation.
-
-Ex.
-
-T(V) = image of V under T
-
-We call this the image of T stated as im(T) when we are referring to any vector in R^n not necessarily a given subspace. The distinction here is that T(V) defines V as the codomain whereas im(T) defines the codomain as all possible vectors in R^n.
-
-## Set Notation
-
-The image of the set S under f is defined as follows:
-
-$img(S) = f(S) = \{f(s) | s \in S\}$
diff --git a/ImitationLearning.md b/ImitationLearning.md
@@ -1,9 +0,0 @@
-# Imitation Learning
-
-L1
-
-## Notes
-
-**Definition:** Imitation learning is not RL. It is the process of training a model on expert data making it a form of supervised learning.
-
-Tangentially related is inverse reinforcement learning where a moduel learns the reward function that the expert is trying to follow.
diff --git a/Imputation.md b/Imputation.md
@@ -1,24 +0,0 @@
-# Imputation
-
-CH2
-
-## Notes
-
-**Definition:** Imputation is the process of filling in null values with some appropriate value.
-
-This is often done with ml to set null values to 0, mean, median, or some other appropriate value.
-
-Using pandas, this can be done using df.fillna().
-
-There is also another way to do this using sklearn.impute's SimpleImputer. This can be used as follows:
-
-```python
-from sklearn.impute import SimpleImputer
-
-imputer = SimpleImputer(strategy="median")
-imputer.fit(df) # Ensure the df only has np.number dtypes. 
-
-X = imputer.transform(df) # Set null values to medians (as specified) for the df. 
-```
-
-The imputer above can also be used with most_frequent (mode), mean, or constant where you would then need to specify a fill_value.
diff --git a/Incremental.md b/Incremental.md
@@ -1,9 +0,0 @@
-# Incremental
-
-CLRS 2.3
-
-## Notes
-
-**Definition:** Incremental algorithms are algorithms that solve the task in order (iteratively).
-
-An example of this is [[InsertionSort.md]].
diff --git a/IncrementalMean.md b/IncrementalMean.md
@@ -1,45 +0,0 @@
-# Incremental Mean
-
-L4
-
-## Notes
-
-**Definition:** Incremental mean is a mean calculation where we update the mean according to the next sample without having to calculate the mean by summing all priors.
-
-This is often used with Monte Carlo Learning where we calculate the empirical mean (perceived mean) not by summing all returns and dividing by iterations, but instead by updating it each time it is visited only based on the change the current finding will make.
-
-With incremental mean all we need to know is the prior mean, the current sample, and the total number of prior iterations. Obviously, with this information we could multiply the prior mean by the total number of prior iterations and then add the current and divide by total samples, but this is slow. Instead we calculate the incremental mean by adding 1/k * (return - prior mean) to the prior mean.
-
-Here is a simple python implementation:
-
-
-```python
-
-import numpy as np
-arr = np.random.rand(10)
-
-# compute mean normal way
-def stdMean(priors):
-    mean = 0
-    for i in priors:
-        mean += i
-    mean = mean/len(priors)
-    return mean
-
-# compute incremental mean
-def incMeanCalc(priorMean, k, current):
-    return priorMean + (1/k * (current - priorMean))
-
-incMean = 0
-
-for k in range(0,len(arr)):
-    if -len(arr) + k + 1 == 0:
-        normMean = stdMean(arr)
-    else:
-        normMean = stdMean(arr[:-len(arr) + k + 1])
-    
-    incMean = incMeanCalc(incMean, k + 1, arr[k])
-    print(incMean)
-    print(normMean)
-
-```
diff --git a/Independence.md b/Independence.md
@@ -1,18 +0,0 @@
-# Independence
-
-L3
-
-## Notes
-
-**Definition:** Independence in probability is the case where some even B occuring does not affect the conditional probability of A occuring. 
-
-Two Formal Definitions:
-
-1. P(A|B) = P(A)
-2. P(A intersect B) = P(A)P(B)
-
-Second definition is better, but first, to me, is more intuitive.
-
-## Conditional Independence
-
-There can also be independence given conditions called conditional independence. This occurs when specifying a condition and finding out that our two conditional definitions are true in the newly refined sample space that has been updated with further knowledge.
diff --git a/IndependentEvents.md b/IndependentEvents.md
@@ -1,9 +0,0 @@
-# Independent Events
-
-Ch 1.4
-
-## Notes
-
-**Definition:** Independent events are events such that the conditional probability is equivalent to the unconditioned probability of the given event.
-
-P(A|B) = P(A) and P(B | A) = P(B).
diff --git a/Indistinguishable.md b/Indistinguishable.md
@@ -1,7 +0,0 @@
-# Indistinguishable
-
-Ch 6.5
-
-## Notes
-
-**Definition:** Indistinguishable means two elements, when switches, do not result in a new permutation.
diff --git a/Individuals.md b/Individuals.md
@@ -1,7 +0,0 @@
-# Individuals
-
-Khan
-
-## Notes
-
-**Definition:** The individuals of a dataset are the objects being studied.  
diff --git a/Induction.md b/Induction.md
@@ -1,37 +0,0 @@
-# Induction Proof
-
-Proof by induction from W11 abstract algebra. Induction is used to prove a statement relating to infinite sets of elements. This is not to be confused with inductive reasoning which is assumptions based on past data. 
-
-## Notes
-
-**Definition:** This type of proof is done by proving that the first is true and how that subsequently means the rest are true (think dominoes).
-
-Steps to prove:
-1. Prove first statement is true
-2. Prove that the statement $S_k \implies S_{k+1}$ is true.
-
-**Basis Step:** The first step above is called the basis step because $S_1$ is generally quite simple. 
-
-**Inductive Step:** The second step above is called the inductive step. This is most commonly done via direct proof ie. assuming $S_1$ is true.
-
-**Inductive Hypothesis:** This is the assumption that $S_k$ is true. Based on this assumption, we must then prove that it follows $S_K \implies S_{k+1}$.
-
-To prove this, it seems, we need to first prove that the 0th or first (generally) term of the sequence is true. This will generally be 0 or 1 indexed so such proofs are generally straightforward. Then we find an elements written out form, and then we find it's form + 1 in the index. Using this technique we should be able to show that the +1 index is in some way related to the +0 index. 
-
-I am not sure if you need to prove the first element or if the logic to backpropogate is sufficient. 
-
-Use if then notation for the current so that if the first then the second. Since we know the first we then need to prove the then follows logically in all cases across the domain of the function. 
-
-If the set of numbers is bidirectional like $\Z$, then we need to prove that both $S_n \implies S_{n+1}$ and that $S_n \implies S_{n-1}$ for all numbers in the domain. 
-
-When using induction the common form is $S_k \implies S_{k+1}$, but it is equally valid to prove that $S_{k-1} \implies S_k$ if that is easier to show. 
-
-When proving induction it is important to first state what the value of k+1 equates to. We then need to go from there to equate it to the other side of the statement. We should not assign the left and right together from the start because there would be nowhere to go from there instead do algebra to prove the statement is true.
-
-[[StrongInduction.md]] Is another type of induction. 
-
-See also [[SmallestCounterExample.md]] for something similar to [[CounterExample.md]] of [[Induction.md]]
-
-It is important to note that a set must be [[WellOrdered.md]] for it to be possible to prove by induction. 
-
-Another interesting thing that relates to induction is [[FibonacciNumbers.md]] in the sense that they are entirely reliant upon previous calculations to determine the next value in the set. 
diff --git a/Inertia.md b/Inertia.md
@@ -1,9 +0,0 @@
-# Inertia
-
-ML D5
-
-## Notes
-
-**Definition:** Inertia in machine learning is the sum of the squared distances from instances to their closest centroid. 
-
-This is often used as a gauge for the accuracy of a [[KMeans.md]] model.
diff --git a/Inference.md b/Inference.md
@@ -1,9 +0,0 @@
-# Inference
-
-Ch2
-
-## Notes
-
-**Definition:** Inference is the statistical process of finding relationships between data.
-
-This is not to be confused with [[Prediction.md]] which is the process of guessing an output.
diff --git a/InformationContent.md b/InformationContent.md
@@ -1,7 +0,0 @@
-# Information Content
-
-Ch 6
-
-## Notes
-
-**Definition:** The information content of a finite set of messages S is log_b(n) where n is the cardinality of S and b is the counting system (2 for binary).
diff --git a/Inhomogeneous.md b/Inhomogeneous.md
@@ -1,9 +0,0 @@
-# Inhomogeneous
-
-Khan U2
-
-## Notes
-
-**Definition:** An inhomogeneous solution in linear algebra is a solution where the right side of the system of equations is not the zero vector.
-
-See also [[Homogeneous.md]]
diff --git a/Injective.md b/Injective.md
@@ -1,11 +0,0 @@
-# Injective 
-
-L2
-
-## Notes
-
-**Definition:** For a function to be injective each value in the domain must map to a unique value in the codomain.
-
-Sometimes called one-to-one because there is one y for each x.
-
-For any y in Y there is at most one x such that f(x) = y.
diff --git a/Input.md b/Input.md
@@ -1,14 +0,0 @@
-# Input
-
-CS 331 W12 L1
-
-## Notes
-
-**Definition:** Input is the class used the get input from the user. 
-
-### Common Methods
-
-1. GetKey
-	- This returns true if the key is pressed
-	- Ex. bool moveForward = Input.GetKey("up");
-2. 
diff --git a/InsertionSort.md b/InsertionSort.md
@@ -1,41 +0,0 @@
-# Insertion Sort
-
-CRLS 2.1
-
-## Notes
-
-**Definition:** Insertion sort is a sorting algorithm with a worst case complexity of n^2 that selects the next element in the array, moves it to the left side in the correctly sorted position, and then iterates through the list for all items.
-
-
-This can be thought of as sorting cards by hand. You start with all cards in the right hand. You then remove the first card and place it in the left hand. You then do the same thing for the next card in the right hand placing it in the sorted location. You continue this until are cards are in the left hand. 
-
-The only issue with this analogy is that insertion sort uses only one array instead of two where you track the sorted part of the array and as elements are added all elements to the right are pushed over until reaching the spot where it was formerly. 
-
-Implementation in python:
-
-```python3
-def insertion_sort(ls):
-    sortedLen = 0
-    for i in range(0, len(ls)):
-        x = sortedLen
-        added = False
-        while x >= 0:
-            if ls[x] < ls[i]:
-                move_and_offset(i, x + 1, ls)
-                added = True
-                break
-            x -= 1 
-        if added == False:
-            move_and_offset(i, 0, ls)
-        sortedLen += 1
-    return ls
-
-def move_and_offset(posFrom, posTo, ls):
-    last = ls[posFrom]
-    i = posTo
-    while i < posFrom + 1:
-        temp = ls[i]
-        ls[i] = last
-        last = temp
-        i+=1
-```
diff --git a/InstanceBasedLearning.md b/InstanceBasedLearning.md
@@ -1,13 +0,0 @@
-# Instance Based Learning
-
-ML CH1
-
-## Notes
-
-**Definition:** Instance based learning is a system by which we identify information and when it occurs again, we are able to detect it. 
-
-Think of a spam filter. Something is marked as spam, if the exact same message is seen again, it will be marked as spam because last time it was. 
-
-This can be thought of like a hash map. When hashing an input, if it exists as some bad data or whatever, then we do the same thing.
-
-This can also be implemented differently by searhing for similarities instead of exact matches to catch possible single pixel/character differences.
diff --git a/Instruction.md b/Instruction.md
@@ -1,21 +0,0 @@
-# Instruction
-
-CA L3
-
-## Notes
-
-**Definition:** An instruction is the most basic element of the hardware software interface which describes what to do and to who. 
-
-An instruction is made of two parts, the [[Opcode.md]] describes what to do, and the [[Operands.md]] describe to who. 
-
-There are also classes of instructions. These are the following 3:
-
-1. Operate Instructions
-    - This includes math
-2. Data Movement Instructions
-    - Moves data between IO devices (memory/storage)
-3. Control Flow Instructions
-    - Change sequence of instructions to execute
-
-
-See [[ISA.md]] for more about instruction sets. 
diff --git a/IntegerOverflow.md b/IntegerOverflow.md
@@ -1,7 +0,0 @@
-# Integer Overflow
-
-W1
-
-## Notes
-
-**Definition:** An integer overflow is where we carry a 1 past the end of an integer thus causing it to be 'lost'.
diff --git a/IntelligenceExplosion.md b/IntelligenceExplosion.md
@@ -1,7 +0,0 @@
-# Intelligence Explosion
-
-Superintelligence - Bostrom
-
-## Notes
-
-**Definition:** The intelligence explosion is the idea that once a system achieves human intelligence, it will then be able to recursively self improve causing an explosion in intelligence.
diff --git a/Intractable.md b/Intractable.md
@@ -1,9 +0,0 @@
-# Intractable
-
-U 2.3
-
-## Notes
-
-**Definition:** An intractable problem is one that can not be solved in polynomial time.
-
-These problems generally run in exonential, factorial, or some other time complexity that is higher than polynomial.
diff --git a/Invariance.md b/Invariance.md
@@ -1,9 +0,0 @@
-# Invariance
-
-SS
-
-## Notes
-
-**Definition:** Invariance in ML describes changes to objects such that the model should still interpret the object the same way.
-
-There are a few different types including translational, rotational, and size invariance. 
diff --git a/Inverse.md b/Inverse.md
@@ -1,9 +0,0 @@
-# Inverse 
-
-1.1.2
-
-## Notes
-
-**Definition:** The inverse of an implication statement is the negation of both terms.
-
-$\neg p \to \neg q$ Where the original was $p \to q$
diff --git a/InverseFunction.md b/InverseFunction.md
@@ -1,36 +0,0 @@
-# Inverse Function
-
-L2
-
-## Notes
-
-**Definition:** The inverse function of f(x) is defined as f^-1(x) where f^-1(x) maps from the codomain of f(x) to the domain of f(x).
-
-As such, for a function to be invertible it must be a bijection.
-
-## Finding
-
-The simplest way to find the inverse of a function is by swapping the y and x terms and then solving for y.
-
-Consider:
-
-$$
-f(x) = x^3 + 10
-\newline
-y = x^3 + 10
-\newline
-
-$$
-
-$$
-x = y^3 + 10
-\newline
-x+10 = y^3
-\newline
-y = \sqrt[3]{x+10}
-$$
-
-$$
-f'(x) = \sqrt[3]{x+10}
-$$
-
diff --git a/InverseTransformation.md b/InverseTransformation.md
@@ -1,89 +0,0 @@
-# Inverse Transformation (and matricies)
-
-Khan U2
-
-## Notes
-
-**Definition:** The inverse of a transformation is the transformation that undoes the original transformation for the entire domain codomain of the original transformation.
-
-This transformation must be [[Bijective.md]] otherwise there will be issues with mappings either there are outputs without inputs or there are outputs with multiple inputs.
-
-A transformation is invertible if and only if there exists an f^-1 such that f^-1 composed with f is I (identity function).
-
-(in the world of L.T.'s) Assume the standard matrix of f is A and the standard matrix of f^-1 is B. We then know:
-
-AB = I
-
-## Unique?
-
-Inverse functions are unique (there is only one). 
-
-Let's assume they are not. We then find A(x) = B(x) for all x in R^n. This means A and B are the same function, but they are not. This is a contradiction.
-
-## Invertible?
-
-To find this we know it needs to be bijective. When solving for RREF if there are instances in the R^m space, where R^m is the codomain, that are not mapped to (found by having a row of zeroes where we can't map to everything based on the combination) then the standard matrix is not invertible as it stands from R^n to R^m.
-
-As such, T is onto iff C(A) = R^m (columns span R^m). **We know this is only true when RREF has a pivot in each row. ([[Rank.md]] of the matrix = m)**
-
-For injectivity we test for one-to-one. To find this we need to make sure the rank of the matrix is equal to n where n is the number of columns.
-
-Basically, we need a square matrix that is linearly independent (rows = columns).
-
-As such, the matrix is only invertable if the RREF is the identity matrix in R^n.
-
-Another thing about this; if the determinant is not 0 the matrix is invertible if not then it is not invertible.
-
-## Solving
-
-#### Intuitive
-
-There are formulas and stuff for solving this, but let's think from an intuitive sense first. To solve for the inverse of some matrix we need to find the matrix that converts from the original matrix to the identity matrix. If we are to create an augmented matrix where the left is the original matrix and the right is the identity matrix we can then perform changes to both sides to convert the left side to the identity matrix. By doing this we will then be left with the right side as the inverse matrix.
-
-That's a lot of text so let's do a simple example of a 2x2 matrix:
-
-$A=\begin{bmatrix}
-2 & 3 &| 1 & 0 \\
-1 & 2 & |0 & 1
-\end{bmatrix}$
-
-$\begin{bmatrix}
-1 & 1 & |1 & -1 \\
-0 & 1 &|-1 & 2
-\end{bmatrix}$
-
-$\begin{bmatrix}
-1 & 0 & |2 & -3 \\
-0 & 1 & |-1 & 2
-\end{bmatrix}$
-
-Solution:
-
-$A^{-1} =\begin{bmatrix}
-2 & -3 \\
--1 & 2
-\end{bmatrix}$
-
-As can be seen we simply make changes on the left side and update the former identity accordingly to track what matrix would result in such a matrix on the left side starting from the original matrix.
-
-#### Formulaic
-
-The formula is useful, but simply a description of the rules we can apply. For the 2x2 formula we 0 the first column from the second row, zero the second column from the first row, and then normalize both rows.
-
-I was going to show this derivation, but it is trivial and long.
-
-##### 2x2 Formula:
-
-$A =\begin{bmatrix}
-a & b \\
-c & d
-\end{bmatrix}$
-
-$A^{-1}=
-\frac{1}{ad-bc}
-\begin{bmatrix}
-d & -b \\
--c & a
-\end{bmatrix}$
-
-The interestinge thing about this is the denominator of the fraction out front is the determinant of the matrix.
diff --git a/Invertible.md b/Invertible.md
@@ -1,7 +0,0 @@
-# Invertible (Matrix)
-
-Khan
-
-## Notes
-
-**Definition:** For a matrix A to be invertible there must be another matrix B such that A * B = I where I is the identity matrix.
diff --git a/IteratedExpectations.md b/IteratedExpectations.md
@@ -1,11 +0,0 @@
-# Iterated Expectations
-
-L12
-
-## Notes
-
-**Definition:** The law of iterated expectations states the expected value of a conditional expectation is the unconditional expectation. 
-
-Simply, this means that when finding the expectationt of some random variable we can sum the weighted expectation for its base components. 
-
-Let's say we have a PDF that is 1/3 from 0-1 is height and 1-2 has a height of 2/3. We can then find the expectation of the first component (0-1) and multiply it by the probability of the grouping (1/3) and then sum this with all other components in the same manner to find the unconditional probability.
diff --git a/Jerk.md b/Jerk.md
@@ -1,9 +0,0 @@
-# Jerk
-
-Section 2.8
-
-## Notes
-
-**Definition:** A jerk is the third derivative of a position function. 
-
-The first derivative would be velocity, second would be acceleration, third is jerk.
diff --git a/JointDensityFunction.md b/JointDensityFunction.md
@@ -1,9 +0,0 @@
-# Joint Density Function
-
-Prob L9
-
-## Notes
-
-**Definition:** A joint density function is a function that takes two inputs and outputs a probability of the combination. 
-
-We can define the function as f_{XY} : R^2 -> R such that for all A in R^2 we have P((X,Y) in A) = integral(integral(f_XY(x,y)))
diff --git a/JointProbability.md b/JointProbability.md
@@ -1,17 +0,0 @@
-# Joint Probability
-
-Stats L2 + L6
-
-## Notes
-
-**Definition:** A joint probability is the probability of multiple conditions.
-
-An example of this is that 48% of voters are in favor of the bill and democrats. This is the joint probability of any given voter being both a democrat and in favor of the bill.
-
-## Joint PMF (L6)
-
-The joint PMF is simply the PMF that represents the probability of joint outcomes. 
-
-The joint PMF function, much like normal PMF functions, can be stated as P_{X,Y}(x,y) where X and Y are the joint conditional probabilities and x,y are the joint conditions we are evalulating for. 
-
-An important note is that the sum of all x,y for P{X,Y} = 1.
diff --git a/KMeans.md b/KMeans.md
@@ -1,18 +0,0 @@
-# K-means (Clustering)
-
-ML CH2
-
-## Notes
-
-**Definition:** K-means clustering is a clustering algorithm that clusters data together by finding the mean distance from clusteroids and places said element into said cluster.
-
-Basic idea:
-
-1. Select cluster centroids
-2. Go through elements finding nearest centroid mean
-3. Add item to centroid and update the mean position
-4. Repeat Step 2
-
-When using kmeans clustering it can, at times, find local optimum instead of global optimum. To help with this issue one thing that can be done is passing in a list of starting positions for centroids. 
-
-Another solution is to run the algorithm multiple times with different random starting positions. We then take the best solution which minimizes [[Inertia.md]].
diff --git a/KNearestNeighbor.md b/KNearestNeighbor.md
@@ -1,18 +0,0 @@
-# k Nearest Neighbor
-
-ML CH1
-
-## Notes
-
-**Definition:** k nearest neighbor is the idea of using the k nearest elements of some set to derive some information. 
-
-In ml, this can be used used to find the k nearest neighbor regression of a sample using an instance based approach where you would find the k nearest values and average them. This would then be the prediction for the sample. 
-
-Using sklearn you can specify to load k nearest neighbor as follows:
-
-```python
-
-from sklearn.neighbors import KNeighboarsRegressor
-model = KNeighborsRegressor(n_neighbors=3)
-
-```
diff --git a/Kernel.md b/Kernel.md
@@ -1,11 +0,0 @@
-# Kernel
-
-Khan
-
-## Notes
-
-**Definition:** The kernel of a linear transformation is the set of all vectors that are equal to the null vector under the L.T.
-
-This is stated as ker(T), spoken as the kernel of T.
-
-This is similar to the [[NullSpace.md]] except it is specific to linear transformations.
diff --git a/Key.md b/Key.md
@@ -1,9 +0,0 @@
-# Key
-
-Ch 5
-
-## Notes
-
-**Definition:** A key is list of attribute of an object x that uniquely identifies it from all other elements of our universe. 
-
-In hashtables we hash the keys and then store the items. When openaddressing, if the queried for object is not the one at the address, we continue on with our probing algorithm until we find the item or find an empty spot (this assumes we don't remove items from our table).
diff --git a/KeyframeAnimation.md b/KeyframeAnimation.md
@@ -1,31 +0,0 @@
-# Keyframe animation
-
-CG W13 L3
-
-## Notes
-
-**Definition:** Keyframe animation is the process of animation used in blender where you specify keyframes and positions of objects at said times. 
-
-### Motion / Objects:
-
-We call these objects that are animated marionettes. These are the dolls with wires and strings instead of a hand.
-
-They also have joints which allow for rotation about a point.
-
-Controllers are the strings that can be pulled that control each section. 
-
-When these strings are pulled, the attached part of the marionette moves, and attached parts are also moved to ensure joints don't become disjoint. This is how motion of a marionette is transmitted.
-
-### Keyframes:
-
-Animators used to draw just a few "key" frames hence the term. The goals is to have enough of these to make motion seem seamless. 
-
-Now, this is a bit different because of interpolation which makes it so we simply need to ensure correct motion between the keyframes instead of ensuring each position at each time. 
-
-See [[Animation.md]] which is related to this topic.
-
-### Creation
-
-The "Dope Sheet" is the sheet at the bottom of the screen that shows the timeline for keyframe rendering (this is blender specific, but maybe can be extrapolated).
-
-When creating keyframe animations we select a frame from the dopesheet then we specify the pose at the given time. Rinse and repeat. 
diff --git a/LCM.md b/LCM.md
@@ -1,7 +0,0 @@
-# LCM
-
-U 2.4
-
-## Notes
-
-**Definition:** LCM is the least common multiple of two numbers meaning it is the smallest number that is divisible by both values.
diff --git a/LLE.md b/LLE.md
@@ -1,11 +0,0 @@
-# LLE (Locally Linear Embedding)
-
-ML D5
-
-## Notes
-
-**Definition:** LLE is a dimensionality reduction technique that uses manifold learning instead of projection.
-
-LLE works by finding the distance between an instance and its nearest neighbors and then lookking for a low-dimensional representation of the training set where these relationships are best preserved. 
-
-This approach is good at unrolling twisted manifolds when there is not too much noise.
diff --git a/LabelEncoding.md b/LabelEncoding.md
@@ -1,13 +0,0 @@
-# Label Encoding
-
-ML CH2
-
-## Notes
-
-**Definition:** Label encoding is the process of encoding some arbitrary label as an arbitrary number. 
-
-This is often done when you have a string input to a neural network or linear regression model and there are too many options for the given feature to do [[OneHotEncoding.md]]. 
-
-One issue with this is that the labels are arbitrary so if the model tries to use these numbers to predict higher being better or worse there will be issues. 
-
-See also [[TargetEncoding.md]] for another way to encode strings as numbers.
diff --git a/LasVegasMethod.md b/LasVegasMethod.md
@@ -1,7 +0,0 @@
-# Las Vegas Method
-
-SS
-
-## Notes
-
-**Definition:** The Las Vegas method is similar to the monte carlo method as it uses random sampling, but it always gives the correct answer whereas the monte carlo method does not guarantee a correct answer. 
diff --git a/LassoRegression.md b/LassoRegression.md
@@ -1,9 +0,0 @@
-# Lasso Regression (Least absolute shrinkage and selection operator regression)
-
-ML D3
-
-## Notes
-
-**Definition:** Lasso regression is another form of linear regression that adds a regularization term to the loss function but weights it different than ridge regression.
-
-The main difference between this and ridge is that ridge scales coeficients consistently whereas this does not. As such, often it outputs a sparse model which scales certain coeficcients to 0.
diff --git a/LatentSpace.md b/LatentSpace.md
@@ -1,9 +0,0 @@
-# Latent Space (embedding space)
-
-SS
-
-## Notes
-
-**Definition:** Latent space is a lower dimensional embedding space used to represent higher dimensional information.
-
-Think of using an autoencoder to embed high dim data into lower dimensions. This ties in more generally with dimensionallity reduction as well.
diff --git a/LawOfCosines.md b/LawOfCosines.md
@@ -1,9 +0,0 @@
-# Law of Cosines
-
-SS
-
-## Notes
-
-**Definition:** The law of cosines is defined as c^2 = a^2 + b^2 - 2ab cos(C) where a and b are side lengths and c is the side length to be found that is opposite of the angle C.
-
-When using the law of cosines it is important to note we are finding the side length opposite of the known angle. 
diff --git a/LawOfDetachment.md b/LawOfDetachment.md
@@ -1,11 +0,0 @@
-# Law of Detatchment
-
-U 1.6.1
-
-## Notes
-
-**Definition:** The law of detachment is a law that specifies a form that valid arguments can take.
-
-This form is $(p \wedge (p\to q) \to q$.
-
-Simply, this can be thought of as stating if you have a premise that implies something then that implied statement can be assumed true given that all premises are also true. 
diff --git a/LawOfLargeNumbers.md b/LawOfLargeNumbers.md
@@ -1,9 +0,0 @@
-# Law of Large Numbers (LLN)
-
-L19
-
-## Notes
-
-**Definition:** The average results from a large set of independent trials converges upon the true value.
-
-See also [[RegressionToTheMean.md]]
diff --git a/LeakyReLU.md b/LeakyReLU.md
@@ -1,22 +0,0 @@
-# Leaky ReLU
-
-ML P554
-
-## Notes
-
-**Definition:** Leaky ReLU is a variant of ReLU designed to solve the problem of neurons dying due to the use of ReLU.
-
-Leaky ReLU adds a small (or larger) slope to the function representing values less than 0 for the activation function. This ensures neurons don't die, but they can enter long coma phases.
-
-ReLU sometimes kills neurons because all inputs for all training samples result in a negative input to the activation function thus causing it to always output 0.
-
-This can be specified in keras as follows:
-
-```python3
-
-leaky_relu = tf.keras.layers.LeakyReLU(alpha=0.2) # defaults to alpha=0.3
-dense = tf.keras.layers.Dense(50, activation=leaky_relu, kernel_initializer="he_normal")
-
-```
-
-Basically, initialize leaky relu with the hyperparameter of slope and then set it as the layer's activation function. Interestingly, when not specified 'Dense' uses a linear activation function which outputs the inputs * weights + bias.
diff --git a/LearningRate.md b/LearningRate.md
@@ -1,16 +0,0 @@
-# Learning Rate
-
-ML L2
-
-## Notes
-
-**Definition:** The learning rate is a constant used to narrow in upon some value based on it's distance from an expected value. The further away from the value, the larger the change for a parameter(s) will be.
-
-
-See [[GradientDescentCode.md]] and [[GradientDescent.md]] for an example of when a learning rate would be used and an implementation of it.
-
-Additionally, learning rate in a higher level sense, with regard to online learning, is how quickly a model will adapt to new data.
-
-These constants that affect learning rate are called "hyperparameters" which are defined as constants prior to model training that are not built into the model.
-
-Another term is also the learning schedule. This is the rate at which the learning rate changes. In the case of [[GradientDescent.md]] this would be the amount it decreases over time as you narrow in on an optima.
diff --git a/LexicographicOrdering.md b/LexicographicOrdering.md
@@ -1,11 +0,0 @@
-# Lexicographic Ordering
-
-Ch 9.6
-
-## Notes
-
-**Definition:** Lexicographic ordering is the same as alphabetic ordering.
-
-Consider the case of (1, 1, 100), (1,4), (2,1), (2,2), (2,0)
-
-In lexicographic order we see that (1,1,100) (1,4) < (2,0) < (2,1) < (2,2).
diff --git a/Lighting.md b/Lighting.md
@@ -1,19 +0,0 @@
-# Lighting
-
-CS 331 W12 L3
-
-## Notes
-
-### Light Options
-
-1. Point Light Source
-	* Intensity decreases over time (range param)
-	* Simulates a local light source from one point
-2. Directional
-	* This simulates a light source infinitely far away
-	* This also has a directional component and brightness component
-	* Our position has no effect on the light, but orientation does
-3. Spot Light 
-	* Light comes out in a cone
-	* This means one small area has a circle of light
-	* This is different from a point light as it is directional
diff --git a/LinearAlgebra.md b/LinearAlgebra.md
@@ -1,113 +0,0 @@
-# Linear Algebra
-
-Linear algebra related links.
-
-The basis of linear algebra is solving systems of equations.
-
-## Links
-
-Linear Algebra Done Right:
-
-Chapter 1:
-
-- [VectorSpace](VectorSpace.md)
-- [Tuple](Tuple.md)
-- [ComplexVectorSpace](ComplexVectorSpace.md)
-- [Subspace](Subspace.md)
-- [RealVectorSpace.md](RealVectorSpace.md)
-- [Coordinate](Coordinate.md)
-- [SumOfVectorSpaces](SumOfVectorSpaces.md)
-- [DirectSum](DirectSum.md)
-
-Chapter 2:
-
-- [LinearCombination](LinearCombination.md)
-- [Span](Span.md)
-- [FiniteDimensional](FiniteDimensional.md)
-- [LinearIndependence](LinearIndependence.md)
-- [BasisOfSubspace](BasisOfSubspace)
-- [StandardBasis](StandardBasis.md)
-- [Dimensions](Dimensions.md)
-
-Chapter 3:
-
-- [LinearMaps](LinearMaps.md)
-
-Khan Academy:
-
-Khan Unit 1 (mostly):
-
-- [Matrix](Matrix.md)
-- [LinearEquations](LinearEquations.md)
-- [SystemsOfEquations](SystemsOfEquations.md)
-- [LinearCombination](LinearCombination.md)
-- [ColumnSpace](ColumnSpace.md)
-- [DistanceCalculation](DistanceCalculation.md)
-- [DotProduct](DotProduct.md)
-- [VectorMatrixMultipication](VectorMatrixMultipication.md)
-- [Invertible](Invertible.md)
-- [UnitVector](UnitVector.md)
-- [Span](Span.md)
-- [LinearIndependence](LinearIndependence.md)
-- [LinearSubspace](LinearSubspace.md)
-- [Closure](Closure.md)
-- [BasisOfSubspace](BasisOfSubspace.md)
-- [AngleBetweenVectors](AngleBetweenVectors.md)
-- [LawOfCosines](LawOfCosines.md)
-- [EquationOfAPlane](EquationOfAPlane.md)
-- [CrossProduct](CrossProduct.md)
-- [Arcsin](Arcsin.md)
-- [Arccos](Arccos.md)
-- [TripleProductExpansion](TripleProductExpansion.md)
-- [NormalVector](NormalVector.md)
-- [DistanceToPlane](DistanceToPlane.md)
-- [PlaneToPlaneDistance](PlaneToPlaneDistance.md)
-- [ReducedRowEchelonForm](ReducedRowEchelonForm.md)
-- [Transpose](Transpose.md)
-- [NullSpace](NullSpace.md)
-- [Nullity](Nullity.md)
-- [Rank](Rank.md)
-
-Khan Unit 2:
-
-- [Codomain](Codomain.md)
-- [Range](Range.md)
-- [Transformations](Transformations.md)
-- [LinearTransformation](LinearTransformation.md)
-- [IdentityMatrix](IdentityMatrix.md)
-- [Image](Image.md)
-- [Preimage](Preimage.md)
-- [Kernel](Kernel.md)
-- [DiagonalMatrices](DiagonalMatrices.md)
-- [Rotation](Rotation.md)
-- [StandardMatrix](StandardMatrix.md)
-- [UnitVector](UnitVector.md)
-- [Projection](Projection.md)
-- [MatrixMultiplication](MatrixMultiplication.md)
-- [InverseTransformation](InverseTransformation.md)
-- [Surjective](Surjective.md)
-- [Injective](Injective.md)
-- [Bijective](Bijective.md)
-- [Homogeneous](Homogeneous.md)
-- [Inhomogeneous](Inhomogeneous.md)
-- [Determinant](Determinant.md)
-- [RuleOfSarrus](RuleOfSarrus.md)
-- [Hypervolume](Hypervolume.md)
-- [Hyperplane](Hyperplane.md)
-- [AmbientSpace](AmbientSpace.md)
-- [Shear](Shear.md)
-- [RightHandRule](RightHandRule.md)
-- [Duality](Duality.md)
-- [CramersRule](CramersRule.md)
-- [GaussianElimination](GaussianElimination.md)
-- [EigenVector](EigenVector.md)
-- [Transpose](Transpose.md)
-
-Khan Unit 3:
-
-- [OrthogonalComplement](OrthogonalComplement.md)
-- [Projection](Projection.md)
-- [ChangeOfBasis](ChangeOfBasis.md)
-- [Orthonormal](Orthonormal.md)
-- [GramSchmidtProcess](GramSchmidtProcess.md)
-- [EigenVector](EigenVector.md)
diff --git a/LinearCombination.md b/LinearCombination.md
@@ -1,11 +0,0 @@
-# Linear Combination 
-
-**Source:** Linear Algera Done Right
-
-**Chapter:** 2
-
-## Notes
-
-### In Linear Algebra
-
-**Definition:** A linear combination is ca + db for any scalars c and d where a and b are vectors.
diff --git a/LinearCongruence.md b/LinearCongruence.md
@@ -1,7 +0,0 @@
-# Linear Congruence
-
-Ch 2.4
-
-## Notes
-
-**Definition:** A linear congruence is a congruence of the form ax \equiv b (mod c) where a,b,c are integers and x is a variable.
diff --git a/LinearEquations.md b/LinearEquations.md
@@ -1,9 +0,0 @@
-# Linear Equations
-
-Khan
-
-## Notes
-
-**Definition:** Linear equations are equations of the form y = mx+b where m and b are real coefficients. 
-
-Simply, linear equations are any equation that results in a line when graphed. 
diff --git a/LinearHomogeneousRecurrenceRelation.md b/LinearHomogeneousRecurrenceRelation.md
@@ -1,13 +0,0 @@
-# Linear Homogeneous Recurrence Relation
-
-Ch 8.2
-
-## Notes
-
-**Definition:** A linear homogeneous recurrence relation is a recurrence relation where each element is a linear combination of k prior elements (degree k).
-
-Example of k degree LHRR:
-
-$a_n = c_1a_{n-1} + c_2a_{n-2} + ... + c_ka_{n-k}$
-
-Assume all c terms are coefficients and all c_i are non-zero. 
diff --git a/LinearIndependence.md b/LinearIndependence.md
@@ -1,72 +0,0 @@
-# Linear Independence
-
-**Source:** Linear Algebra Done Right
-
-**Chapter:** 2
-
-## Notes
-
-**Definition:** Linear independence means that every column in a given matrix gives another degree of freedom. 
-
-This can also be thought of as there only being one way to make every vector with linear combinations of vectors in the span.
-
-Conversely, linear dependent vectors are vectors that are on the same line (or plane) as some other vector (or combination of vectors) thus not giving the matrix another degree of freedom.
-
-Interesting thing; if you are in less dimensions than the number of vectors it is guaranteed their is linear dependence because you can't go beyond the current dimension. 
-
-Ie.
-
-[2 7 10]
-[3 8  1]
-
-Given that [2,3] and [7,8] consumes all of R^2, we know there are no degrees of freedom provided by [10,1] despite it not being on either of the lines created by the first two columns.
-
-### Test For Dependence
-
-If c_1*a + c_2*b = 0 is true for some constants c_1 and c_2 then we have dependence assuming at least one coeficcient is not zero. This is true for an arbitrary number of vectors and constants. If this is only possible with coeficcients that are equal to zero then we have independence.
-
-
-### Intuitive Definition
-
-Linear independence means each vector in a set of vectors (possibly matrix) adds something to the matrix such that the [[Span.md]] of the set of vectors is larger.
-
-
-### Solving
-
-A simple way to solve this is using our knowledge that c_1 * a + ... + c_n * z = I where I is the identity matrix. Knowing this, we can create an augmented matrix that represents this information as follows:
-
-[1 3 4 | 0]
-[2 5 5 | 0]
-[2 4 7 | 0]
-
-Above the final column (which makes this an augmented matrix) represents the target value of the equation on the left. The left side is represented by columns whereby each is associated with a different coeficcient. 
-
-If we show that this is only true when c_1,c_2, and c_3 are zero then they are independent. If there is a solution where all coeficcients are not zero then they are not independent and thus dependent. Observe the next steps of elimination by multiplying columns to fill the in zeroes to get to row echelon form:
-
-First let's multiply the first row by -2 and add it to the second row. This can be done because they are equivalent in value (0) thus the addition does not break the system. 
-
-[1  3  4 | 0]
-[0 -1 -3 | 0]
-[2  4  7 | 0]
-
-Let's now multiply row 2 by 4 to cancel out the first and second zeroes of the final equation:
-
-[1  3  4 | 0]
-[0 -1 -3 | 0]
-[0  0 -5 | 0]
-
-Now we see each coeficcient has been singled out. Let's solve the system now where column 1 is c_1, column 2 is c_2, and column 3 is c_3:
-
--5c_1 = 0
-c_1 = 0
-
--1c_2 + -3c_1 = 0
--1c_2 = 0
-c_2 = 0
-
-c_1 + 3c_2 + 4c_3 = 0
-c_1 = 0
-
-As we can see, the only solution when each variable is singled out is 0,0, and 0 thus each vector is independent.
-
-If this were not the case then we would not be able to single out each variable. When this is the case we should get as many zeroes as possible and then identify what coeficcients make the statement true without them all being 0. This is suficcient proof to declare the vectors are not independent.
diff --git a/LinearMaps.md b/LinearMaps.md
@@ -1,12 +0,0 @@
-# Linear Map
-
-**Source:** Linear Algebra Done Right
-
-**Chapter:** 3
-
-## Notes
-
-**Definition:** A linear map is a function f : V -> W where V and W are vector spaces, that has the following properties:
-
-1. Additivity - T(u + v) = Tu + Tv
-2. Homogeneity - T(av) = a(Tv) for all a in F and v in V
diff --git a/LinearProbing.md b/LinearProbing.md
@@ -1,9 +0,0 @@
-# Linear Probing
-
-Ch 5
-
-## Notes
-
-**Definition:** Linear probing is a probing (open addressing) strategy that selects the next open index to place any objects that experienced a collission.
-
-The problem with linear probing is clustering. This is the process whereby elements that have collided grow into larger groupings such that the probability of the element after the cluster being selected is far higher than other elements in the array. This is problematic because we want our hashtable to be uniformly distributed. This is a problem that can often be solved by using quadratic probing.
diff --git a/LinearRegression.md b/LinearRegression.md
@@ -1,32 +0,0 @@
-# Linear Regression
-
-ML L2 - Also referred to as ordinary least squares
-
-## Notes
-
-**Definition:** Fitting a straight line to data which allows for arbitrary inputs in the valid domain but not necessarily in the training set, to get accurate outputs.
-
-The goal is to find a $\theta$ (parameters) that minimizes $J(\theta)=\frac{1}{2}\sum_{i=1}^{m}(h(x_i) - y_i)^2$. This is called the cost function.
-
-To load linear regression using SkiLearn do as follows:
-
-```python3
-
-from sklearn.linear_model import LinearRegression
-model = LinearRegression()
-
-```
-
-Note that the constant term for a linear regression model is referred to as the bias term or the intercept term. 
-
-The normal equation for linear regression (closed-form solution) is as follows:
-
-Theta = (X transpose * X) ^ -1 * X transpose * y
-
-Where y is an m x 1 vector of target values and X is in some way related to inputs as a matrix with a column of ones for the intercept term... 
-
-This way of linear regression, the closed form way, is better when there are not a massive number of features, but if there are lots of features or the training instances aer too vast to fit into memory, then the [[GradientDescent.md]] way is better.
-
-See [[RidgeRegression.md]], [[LassoRegression.md]], and [[ElasticNetRegression.md]] for some ways to constrain linear models (decrease degrees of freedom to avoid overfitting).
-
-As it relates to linear regression, it is good to add some regularization and when we know only a few features matter elastic regression is good. Otherwise, in most cases, ridge regression is a good option when we don't think there are useless features.
diff --git a/LinearSubspace.md b/LinearSubspace.md
@@ -1,21 +0,0 @@
-# Linear Subspace (or simply subspace)
-
-Khan
-
-## Notes
-
-**Definition:** A linear subspace is a subset (inclusive of the subset being the entire set) of a space of equal or greater cardinality where the linear subspace contains the zero vector.
-
-Things like a plane that passes through the origin in R^n, a line that passes through the origin in R^n, or R^n itself are all specific linear subspaces (or just subspaces for short).
-
-Additionally, we have closure under addition meaning any two vectors added together in the set are still contained within the subspace and the superspace.
-
-The final thing is it is closed under multiplication. As such, multiplying any vector in the set by any scalar means the result is also in the subspace.
-
-### Summary
-
-Three rules neccesary and sufficient for subspace definition:
-
-1. Closed under multiplication (scalar)
-2. Closed under addition (with other elements)
-3. Contains zero vector
diff --git a/LinearTransformation.md b/LinearTransformation.md
@@ -1,66 +0,0 @@
-# Linear Transformation
-
-Khan
-
-## Notes
-
-**Definition:** A linear transformation is a function with an input and output vector that respects addition and scalar multiplication.
-
-## Formally
-
-The two rules where a and b are vectors and c is a scalar:
-
-T(a + b) = T(a) + T(b)
-
-T(ca) = cT(a)
-
-This is necessary and sufficient for the function T to be a L.T.
-
-This can be stated as the origin must remain fixed and all lines must remain lines. If we visualize this then the first requirement can be thought of as only allowing all grid lines to rotate but not offset. To visualize the second think that we can't have grid lines in the end that curved. As such, all uniform grid lines must be uniformly spaced in the end as well otherwise diagnol lines would become curved and thus the function would not respect lines. 
-
-## Interesting Notes
-
-When determining mappings of LTs we can use our knowledge of the multiplicative nature of LTs to say c <x, 0> = <cx,0> for any coeficcient c. This is powerful as it allows us to describe any (input) vector on the line that contains any other (input) vector that we already know the mapping for.
-
-Ex. 
-
-[1] -> [5]
-[0]    [2]
-
-Then we know
-
-[2] -> [5x2] = [10]
-[0]    [2x2]   [ 4]
-
-Given this, if we know what the unit vectors map to we can then use them as a composite for all other mappings. In n dimensional space this means we have n vectors of length 1 where each vector has all zero components except one which is = 1.
-
-When describing LTs in matrix form each column represents where a given unit vector will be mapped to. This is ordered so the first column will be the mapping of [1,..., 0] the second [0, 1, ..., 0] and so on.
-
-**Important:**
-
-Any LT can be represented as a matrix and all matrix multiplication is a LT.
-
-## Image
-
-The image of a linear transformation (im(T)) are all possible outputs of the function where the inupts of T are any vector in R^n. 
-
-The image of Z under T are all possible outputs of the function with inputs that are in Z.
-
-## Composition
-
-The composition of linear transformations is T(S(x)) where S goes from R^n to R^m and T goes from R^m to R^l. The output of the interior L.T. must have the same codomain as the domain of the exterior L.T.
-
-We call this composition the composition of T with S.
-
-To construct the standard matrix of the composition we simply need to evaluate the output of the composition for each of the basis vectors that span the domain of S. This is true because the composition of two linear transformations is always a linear transformation as we know it is still additive and scalar multiplicative.
-
-If:
-T(x) = Ax
-S(x) = Bx
-
-Then:
-T(S(x)) = A(Bx) = ABx
-
-Compositions of matricies are associative, but not commutative.
-
-With this definition it is intuitive that the standard matrix of the composition is A times B where A and B are the standard matricies of the L.T.s T and S.
diff --git a/Linearithmic.md b/Linearithmic.md
@@ -1,7 +0,0 @@
-# Linearithmic
-
-Ch 2
-
-## Notes
-
-**Definition:** Linearithmic time complexity (or linear log or just n log n) is a commonly used name to describe n log n time complexity. 
diff --git a/LinkedLists.md b/LinkedLists.md
@@ -1,21 +0,0 @@
-# Linked Lists
-
-This is from CS 221 W11 Lecture 13. 
-
-## Notes
-
-**Definition:** A linked list is a list of items that are linked together using pointers. As such they are not in contiguous memory. 
-
-Inserting into and removing from linked lists is faster than arrays when resizing / defragmenting are at play. 
-
-## Types
-
-Acyclic Linked Lists:
-
-[[SinglyLinkedList.md]]
-[[DoublyLinkedList.md]]
-
-Cyclic Linked Lists:
-
-[[CircularLinkedList.md]]
-[[CircularDoublyLinkedList.md]]
diff --git a/LinuxStuff.md b/LinuxStuff.md
@@ -1,8 +0,0 @@
-# Linux Stuff
-
-These are links to linux stuff that I want to remember, but sometimes forget. Consider, I am starting this on 24/04/16 so I will not include any basic things as I already know them well. 
-
-## Notes
-
-[[rsync.md]]
-[[sed.md]]
diff --git a/LoadFactor.md b/LoadFactor.md
@@ -1,7 +0,0 @@
-# Load Factor
-
-Ch 5
-
-## Notes
-
-**Definition:** The load factor of a hashtable is the percentage of the underlying array that is full.
diff --git a/LocalScale.md b/LocalScale.md
@@ -1,9 +0,0 @@
-# Local Scale
-
-CS331 W12 L2
-
-## Notes
-
-Member of transform class that can be assigned. This affects the local scale of the GameObject.
-
-See [[Rotate.md]] for rotating based on local rotation and [[Translate.md]] for moving based on local coordinates. 
diff --git a/LogarithmicDifferentiation.md b/LogarithmicDifferentiation.md
@@ -1,15 +0,0 @@
-# Logarithmic Differentiation
-
-Leonard
-
-## Notes
-
-**Definition:** Logarithmic differentiation is the process of applying logs to both sides of an equation to aid in our ability to find their derivative.
-
-Steps:
-
-1. Take log of both sides
-2. Expand both sides (y will become d/dx(y) times 1/y)
-3. Solve derivative
-4. Multiply both sides by y to remove y term from left
-5. Replace y with original equation (or leave as is)
diff --git a/LogisticRegression.md b/LogisticRegression.md
@@ -1,17 +0,0 @@
-# Logistic Regression (Logit Regression)
-
-ML D3
-
-##  Notes
-
-**Definition:** Logistic regression is a regression method used to determine the probability of some sample being part of some class. 
-
-These are often binary classifiers (when they don't output probabilities) as they can simply output 1 or 0 depending on which probability is higher.
-
-Logistic regression works under the hood by computing a weighted sum of input features and then uses that as an input to a sigmoid function. The output of the sigmoid function is then the probability of it being in the class. The outputting of the output of the sigmoid function is called the logistic of the result.
-
-An interesting thing about logistic regression is that the log loss function does not have a known closed form equation for gradient descent must be used to optimize the algorithm.
-
-With the sigmoid function we define the decision boundary as the x-value for which greater values are true and lesser values are false. This position is at the 50% probability mark.
-
-See [[SoftmaxRegression.md]] for an extrapolation of linear regression for multi-class classification without combining binary classifiers.
diff --git a/Loop.md b/Loop.md
@@ -1,7 +0,0 @@
-# Loop
-
-Ch 4
-
-## Notes
-
-**Definition:** A loop in a graph is a connection to one's self.
diff --git a/LoopInvariant.md b/LoopInvariant.md
@@ -1,15 +0,0 @@
-# Loop Invariant
-
-CLRS 2.1
-
-## Notes
-
-**Definition:** A loop invariant is a condition that is true before and after a loop is ran.
-
-In the case of insertion sort the loop invariant is that [0 : p] is sorted where p is the number of prior iterations (prior elements sorted). See [[InsertionSort.md]] to understand this better.
-
-Given that this must be true before and after running, we know it must be initialized as true which can sometimes mean manual running to get it started outside the loop itself to ensure proper iteration.
-
-We call it maintenance when we are looping and making changes to ensure the invariance remains true.
-
-Termaination is then the end of the loop whereby some condition has been met.
diff --git a/LossFunction.md b/LossFunction.md
@@ -1,11 +0,0 @@
-# Loss Function
-
-Ch 1
-
-## Notes
-
-**Definition:** A loss function is a function from E -> R where E is the set of all events (outcomes) and R is the set of all real numbers where the function describes how bad a given event E is.
-
-When I say 'event' this is in the most general of senses. In the case of RL this could simply be a state and in supervised learning this could be a prediction based on a sample.
-
-When defining a loss function, we are stipulating how bad a result is.
diff --git a/Lvalue.md b/Lvalue.md
@@ -1,15 +0,0 @@
-# lvalue
-
-cs202 W14 L16
-
-## Notes
-
-**Definition:**  An lvalue is a value that is not temporary and cannot be moved.
-
-An example of an lvalue is as follows:
-
-```cpp
-
-int x = 5; // x is an lvalue
-
-```
diff --git a/MAE.md b/MAE.md
@@ -1,27 +0,0 @@
-# Mean Absolute Error
-
-ML CH2
-
-## Notes
-
-**Definition:** MAE also known as average absolute deviation or mean absolute error is an error metric used to describe the accuracy of a model by taking the difference between the inference and actual values of a set of samples and averaging the value.
-
-This is sometimes used when there are many outliers which can largely effect the [[RMSE.md]] error metric because of the way it weights deviations.
-
-Implementation:
-
-```python
-# Often you would use ordered pairs for expected and inference.
-expected = [10, 10, 4, 3, 2, 4, 5, 5]
-inference = [9 , 7, 3, 2, 1, 3, 2, 5]
-
-count = 0
-total = 0
-while count < len(expected):
-    total +=  abs(expected[count] - inference[count])
-    count += 1
-
-total = total / len(expected)
-print(total)
-
-```
diff --git a/MCTS.md b/MCTS.md
@@ -1,7 +0,0 @@
-# MCTS (Monte Carlo Tree Search) 
-
-ML SS
-
-## Notes
-
-**Definition:** 
diff --git a/MLP.md b/MLP.md
@@ -1,13 +0,0 @@
-
-ML D6
-
-## Notes
-
-**Definition:** Multilayer perceptrons are a form of deep neural network that are a feedforward process where each output goes forward to the next layer of perceptrons until reaching the output layer. This is a subset of neural networks as not all NNs are fully connected like RNNs/CNNs.
-
-MLPs can do regression and classification tasks. For regression we need one output for each output feature we would like to predict. With these outputs we can also apply an activation function (default is none), to bound the output range.
-
-For classification tasks you need to dedicate one output neuron for each class. These classes then use a sigmoid activation function that determines the probability of class membership. To get an output with a sum of 1 (wanted in the case of multiclass classification where only one output is expected) we can use a softmax function for each output.
-
-For classification tasks with neural networks we generally want to minimize cross entropy rather than MSE which is the normal metric for regression. Cross entropy is the difference between the predicted distribution and the true distribution. This is also used for logistic regression.
-
diff --git a/MUX.md b/MUX.md
@@ -1,7 +0,0 @@
-# MUX
-
-CA L3
-
-## Notes
-
-**Definition:** A MUX is a multiplexer which allows multiple inputs and selects one to be the output. This is also known as a data selector.
diff --git a/MachineLearning.md b/MachineLearning.md
@@ -1,242 +0,0 @@
-# Machine Learning
-
-Links to ML Notes
-
-**Definition:** Field of study that gives computers the ability to learn without being explicitly programmed. 
-
-## Questions I would like to answer
-
-1. How do I create new ML models
-2. Create chess ML program
-3. Create a walking model
-4. Dropout where you simply skip the gradient update?
-
-## Good Info
-
-x = vector of inputs. Known as features 
-
-y = output also known as target variable
-
-(x,y) = Training example
-
-m = Number of samples 
-
-n = # of features
-
-h(x) = this is the function with an input of x this should be about the correct y.
-
-## Main Links
-
-Deep Learning With Python (Francois Chollet):
-
-Ch 1 (What is DL):
-
-* [RepresentationLearning](RepresentationLearning.md)
-* [LossFunction](LossFunction.md)
-* [UtilityFunction](UtilityFunction.md)
-
-Ch 2 (Maths behind DL):
-
-* Representation
-* DataDistillation
-* Softmax
-* [Optimizer](Optimizer.md)
-* Tensor - Dimension = Axis
-* TensorSlicing - Select specific element
-* BatchAxis - Batch Dimension
-* Rank - axis count of tensor
-* Scalar - OD tensor
-* Overfitting
-* Broadcast - match lower dim tensor with higher generally for element wise comparison
-* [Transpose](Transpose.md)
-* AffineTransformation - Linear transformation + Tranaslation (note that affines composed is still simply an affine thus we need activation functions)
-* HypothesisSpace
-* GeometricTransformation
-* Manifold
-* Kernel (multiply part of weights)
-* [Bias](Bias.md)
-* [Weight](Weight.md)
-* Surface
-* GradientDescent
-* ForwardPass
-* BackwardPass
-* SGD
-* MiniBatchSGD
-* TrueSGD
-* BatchGradientDescent
-* Backpropagation
-* AutomaticDifferentiation
-* Mutable
-
-ISL Python:
-
-Ch 2:
-
-- [Inference](Inference.md)
-- [Prediction](Prediction.md)
-
-Math for Machine Learning:
-
-Ch 2.2
-
-- [MatrixMultiplication](MatrixMultiplication.md)
-- [HadamardProduct](HadamardProduct.md)
-- [IdentityMatrix](IdentityMatrix.md)
-- [Associative](Associative.md)
-- [Distributive](Distributive.md)
-- [Commutative](Commutative.md)
-- [InverseTransformation](InverseTransformation.md)
-- [Transpose](Transpose.md)
-- [SymmetricMatrix](SymmetricMatrix.md)
-- [LinearCombination](LinearCombination.md)
-- [ParticularSolution](ParticularSolution.md)
-- [GeneralSolution](GeneralSolution.md)
-- [ElementaryTransformations](ElementaryTransformations.md)
-- [RowEchelonForm](RowEchelonForm.md)
-- [BasicVariables](BasicVariables.md)
-- [FreeVariables](FreeVariables.md)
-- [ReducedRowEchelonForm](ReducedRowEchelonForm.md)
-- [GaussianElimination](GaussianElimination.md)
-- [MinusOneTrick](MinusOneTrick.md)
-
-Ch 2.4
-
-- MoorePenrosePseudoInverse (approach for solving system of linear equations)
-- Group
-- AbelianGroup (group + commutative)
-- GeneralLinearGroup (group matricies under multiplication think determinants GL(n,R)) 
-- RegularMatricies (invertible)
-- InnerOperation (+ : GxG -> G)
-- OuterOperation ($\cdot$ : RxV -> V - Two different sets in domain)
-
-ML Categories:
-
-- [SupervisedLearning](SupervisedLearning.md)
-- [SemiSupervisedLearning](SemiSupervisedLearning.md)
-- [SelfSupervisedLearning](SelfSupervisedLearning.md)
-- [UnsupervisedLearning](UnsupervisedLearning.md)
-- [ReinforcementLearning](ReinforcementLearning.md)
-- [InstanceBasedLearning](InstanceBasedLearning.md)
-- [ModelBasedLearning](ModelBasedLearning.md)
-
-Concepts:
-
-- [RegressionProblem](RegressionProblem.md)
-- [TransferLearning](TransferLearning.md)
-- [VisualizationAlgorithm](VisualizationAlgorithm.md)
-- [DimensionalityReduction](DimensionalityReduction.md)
-- [AnomalyDetection](AnomalyDetection.md)
-- [NoveltyDetection](NoveltyDetection.md)
-- [RuleLearning](RuleLearning.md)
-- [LinearRegression](LinearRegression.md)
-- [GradientDescent](GradientDescent.md)
-- [ClassificationProblem](ClassificationProblem.md)
-- [SupportVectorMachine](SupportVectorMachine.md)
-- [ClusteringAlgorithms](ClusteringAlgorithms.md)
-- [EigenVector](EigenVector.md)
-- [NLP](NLP.md)
-- [NLU](NLU.md)
-- [Feature](Feature.md)
-- [OfflineLearning](OfflineLearning.md)
-- [OnlineLearning](OnlineLearning.md)
-- [KNearestNeighbor](KNearestNeighbor.md)
-- [Overfitting](Overfitting.md)
-- [Underfitting](Underfitting.md)
-- [GeneralizationError](GeneralizationError.md)
-- [RMSE](RMSE.md)
-- [MAE](MAE.md)
-- [StratifiedSampling](StratifiedSampling.md)
-- [CorrelationCoefficient](CorrelationCoefficient.md)
-- [LogisticRegression](LogisticRegression.md)
-- [Imputation](Imputation.md)
-- [OneHotEncoding](OneHotEncoding.md)
-- [LabelEncoding](LabelEncoding.md)
-- [TargetEncoding](TargetEncoding.md)
-- [Hyperparameter](Hyperparameter.md)
-- [FeatureScaling](FeatureScaling.md)
-- [Standardization](Standardization.md)
-- [MinMaxScaling](MinMaxScaling.md)
-- [OrdinaryLeastSquares](OrdinaryLeastSquares.md)
-- [RadialBasisFunction](RadialBasisFunction.md)
-- [KMeans](KMeans.md)
-- [StochasticAlgorithm](StochasticAlgorithm.md)
-- [Ensembles](Ensembles.md)
-- [ConfusionMatrix](ConfusionMatrix.md)
-- [CrossValidation](CrossValidation.md)
-- [Precision](Precision.md)
-- [TruePositiveRate](TruePositiveRate.md)
-- [HarmonicMean](HarmonicMean.md)
-- [Accuracy](Accuracy.md)
-- [DecisionThreshold](DecisionThreshold.md)
-- [ROC](ROC.md)
-- [MulticlassClassifier](MulticlassClassifier.md)
-- [OneVersusAll](OneVersusAll.md)
-- [OneVersusOne](OneVersusOne.md)
-- [MultilabelClassification](MultilabelClassification.md)
-- [MultioutputClassification](MultioutputClassification.md)
-- [PartialDerivative](PartialDerivative.md)
-- [RidgeRegression](RidgeRegression.md)
-- [LassoRegression](LassoRegression.md)
-- [ElasticNetRegression](ElasticNetRegression.md)
-- [EarlyStopping](EarlyStopping.md)
-- [SoftmaxRegression](SoftmaxRegression.md)
-- [SVM](SVM.md)
-- [DecisionTrees](DecisionTrees.md)
-- [SimilarityFeature](SimilarityFeature.md)
-- [CART](CART.md)
-- [RandomForest](RandomForest.md)
-- [VotingClassifiers](VotingClassifiers.md)
-- [Bagging](Bagging.md)
-- [Pasting](Pasting.md)
-- [Bias](Bias.md)
-- [Variance](Variance.md)
-- [OutOfBag](OutOfBag.md)
-- [RandomPatches](RandomPatches.md)
-- [RandomSubspaces](RandomSubspaces.md)
-- [ExtraTrees](ExtraTrees.md)
-- [AdaBoost](AdaBoost.md)
-- [GradientBoosting](GradientBoosting.md)
-- [HistogramBasedGradientBoosting](HistogramBasedGradientBoosting.md)
-- [Stacking](Stacking.md)
-- [Projection](Projection.md)
-- [Subspace](Subspace.md)
-- [ManifoldLearning](ManifoldLearning.md)
-- [PCA](PCA.md)
-- [RandomProjection](RandomProjection.md)
-- [LLE](LLE.md)
-- [Affinity](Affinity.md)
-- [Segmentation](Segmentation.md)
-- [DBSCAN](DBSCAN.md)
-- [GaussianMixtureModels](GaussianMixtureModels.md)
-- [NeuralNetworks](NeuralNetworks.md)
-- [Perceptrons](Perceptrons.md)
-- [Backpropagation](Backpropagation.md)
-- [MLP](MLP.md)
-- [WideAndDeepNN](WideAndDeepNN.md)
-- [CategoricalCrossEntropy](CategoricalCrossEntropy.md)
-- [VanishingGradients](VanishingGradients.md)
-- [ExplodingGradients](ExplodingGradients.md)
-- [UnstableGradients](UnstableGradients.md)
-- [LeakyReLU](LeakyReLU.md)
-- [GradientClipping](GradientClipping.md)
-- [BatchNormalization](BatchNormalization.md)
-- [PretrainedModels](PretrainedModels.md)
-- [UnsupervisedPretraining](UnsupervisedPretraining.md)
-- [Autoencoder](Autoencoder.md)
-- [Optimizer](Optimizer.md)
-- [Momentum](Momentum.md)
-- [NAG](NAG.md)
-- [AdaGrad](AdaGrad.md)
-- [Adam](Adam.md)
-- [Dropout](Dropout.md)
-- [MaxNormRegularization](MaxNormRegularization.md)
-- [Tensor](Tensor.md)
-- [Transpose](Transpose.md)
-- [CNN](CNN.md)
-- [NaiveBayes](NaiveBayes.md)
-- [Embedding](Embedding.md)
-- [RepresentationLearning](RepresentationLearning.md)
-- [PoolingLayers](PoolingLayers.md)
-- [DataAugmentation](DataAugmentation.md)
-- [SMOTE](SMOTE.md)
-- [LatentSpace](LatentSpace.md)
diff --git a/ManifoldLearning.md b/ManifoldLearning.md
@@ -1,11 +0,0 @@
-# Manifold Learning
-
-ML D5
-
-## Notes
-
-**Definition:** Manifold learning is the process of mapping a higher dimensional object to a lower dimensional manifold.
-
-Manifolds are representations of objects in higher dimensional space using lower dimensional space such that they still maintain attributes. This can be thought of like uv wrapping.
-
-This is often used when projection would cause multiple layers of values to be projected into nearby values which can cause issues.
diff --git a/MarginalProbabilities.md b/MarginalProbabilities.md
@@ -1,7 +0,0 @@
-# Marginal Probabilities
-
-Stats L2
-
-## Notes
-
-**Definition:** Marginal probabilities are probabilities that are not conditional upon any other probabilities.
diff --git a/MarkovAssumption.md b/MarkovAssumption.md
@@ -1,7 +0,0 @@
-# Markov Assumption
-
-L1
-
-## Notes
-
-**Definition:** The Markov assumption is the assumption that prior events don't matter and all necessary information that dictates the future is in the current state.
diff --git a/MarkovChains.md b/MarkovChains.md
@@ -1,70 +0,0 @@
-# Markov Chains
-
-L13
-
-## Notes
-
-**Definition:** A markov chain is a sequence of events where the probability of any given event is **entirely** based on the previous event.
-
-Given that the state needs to have all relevant information, we need to choose our states properly to ensure accuracy.
-
-Anything that evolves with time can be described as a markov chain.
-
-These types of processes are not memoryless like [[BernoulliProcess.md]] or [[PoissonDistribution.md]].
-
-#### Markov Assumption
-
-The Markov assumption is the assumption that each state describes the probability of all transitions related to it regardless of prior states/transitions.
-
-#### Finite Markov Chain Example
-
-Checkout counter:
-
-Customers get in queue (bernoulli process) and then they are served one at a time.
-
-If customer is in queue they are then served which takes a random amount of time.
-
-What is the probability of a customer departing at a given time?
-	- If the queue is empty the odds are 0 (assuming the person being seen is still in the 'queue')
-	- If the queue is not empty then the odds are not 0
-
-As such, this depends on the state of the system making it a markov chain.
-
-If there are only 10 customers who can be in the place then we can write out all states from 0 people to 10 people along with relations between them relating to people moving up the queue, being added to the queue, no changes, and adding to queue and removing at the same time. All of these probabilities summed have to equal 1 for each state.
-
-Each of these connections (transitions) have their own probabilities.
-
-#### Creation
-
-Identify all possible states (think about if each state is sufficient to assign transition probabilities)
-
-Identify transitions
-
-Find transition probabilities (sum to 1)
-
-#### n-step transition probability
-
-n-step transition probabilities are probabilities that starting at state i after n steps we are now in state j. This is stated as:
-
-r_{ij}(n) = P(X_n=j | X_0=i)
-
-To calculate this we can find the probability of each transition for all steps until and including step number n. This is done by multiplying the probablity of a given state by the probability of the transition. The solution will be the final probability of the state j.
-
-Alternatively, we can use recursive approach to find the probability of each transition that connects to the state i.
-
-
-#### Steady State (Convergence)
-
-The steady state of a markov chain is the constant probability of some given state after an arbitrarily long period of time. This can be thought of the limit as n approaches infinity. If there is not convergence then there is not a steady state.
-
-In most cases we will reach a steady state but this might not happen in cases of [[PeriodicChain.md]] or irreducability where not all states there are two seperate recurrent loops. The seperate recurrent loops cause a non-steady state because steady states need to be initial condition agnostic.
-
-#### Recurrent
-
-A state is recurrent if starting from the state there is always a way to get back. We will always end up in a recurrent state after an arbitrarily long period of time as transient states will enventually become unreachable.
-
-#### Transient
-
-Transient states are states that are not recurrent meaning there are cases where you can not get back to them assuming they are the initial state.
-
-As such, the probability of the current state being a transient state after an arbitrarily long time is 0.
diff --git a/MarkovDecisionProcesses.md b/MarkovDecisionProcesses.md
@@ -1,9 +0,0 @@
-# Markov Decision Process (MDP)
-
-RL Ch 1
-
-## Notes
-
-**Definition:** Markov decision processes describe an environment for reinforcement learning.
-
-MDPs are like MRPs except they also have a finite set of actions (action space).
diff --git a/MarkovInequality.md b/MarkovInequality.md
@@ -1,7 +0,0 @@
-# Markov Inequality
-
-L19
-
-## Notes
-
-**Definition:** The Markov inequality gives the probability that a random variable is greater than or equal to some constant. 
diff --git a/MarkovProcess.md b/MarkovProcess.md
@@ -1,7 +0,0 @@
-# Markov Process
-
-Prob L16
-
-## Notes
-
-**Definition:** Markov processes are multiple trials of [[MarkovChains.md]].
diff --git a/MarkovRewardProcess.md b/MarkovRewardProcess.md
@@ -1,7 +0,0 @@
-# Markov Reward Process (MRP)
-
-L2
-
-## Notes
-
-**Definition:** A markov reward process is a markov chain with reward values associated with states or transitions.
diff --git a/Math310.md b/Math310.md
@@ -1,19 +0,0 @@
-# Math 310
-
-This is the index for my main Math 310 notes. 
-
-## Main Links
-
-- [Induction](Induction.md)
-- [StrongInduction](StrongInduction.md)
-- [SmallestCounterExample](SmallestCounterExample.md)
-- [CounterExample](CounterExample.md)
-- [VectorSpace](VectorSpace.md)
-- [PowerSet](PowerSet.md)
-- [Contrapositive](Contrapositive.md)
-- [Contradiction](Contradiction.md)
-- [CartesianProduct](CartesianProduct.md)
-- [ProveSetEquality](ProveSetEquality.md)
-- [PerfectNumbers](PerfectNumbers.md)
-- [GaussianIntegers](GaussianIntegers.md)
-- [Partition](Partition.md)
diff --git a/MathConceptsCS331.md b/MathConceptsCS331.md
@@ -1,8 +0,0 @@
-# Math Concepts CS 331
-
-Math Relating to CS331.
-
-## Notes
-
-[[DotProduct.md]]
-[[Determinant.md]]
diff --git a/Matrix.md b/Matrix.md
@@ -1,28 +0,0 @@
-# Matrix
-
-Khan
-
-## Notes
-
-**Definition** A matrix is a 2d grid of numerical values.
-
-Matricies can be used to describe systems of equations as follows:
-
-4x - 2y = 10
--2x + y = 2
-
-[4 -2] [x] = [0]
-[-2 1] [y] = [3]
-
-This is the form because we distribute x on the first column and y on the second column. 
-
-
-## Matrix Vector Product
-
-The product of a matrix and a vector is another vector of the same size as the original vector. This assumes the number of components in the vector is the same as the number of columns in the matrix.
-
-[a b c]   [j]   [aj bk cl]
-[d e f] @ [k] = [dj ek fl]  (note the result is a vector not a matrix)
-[g h i]   [l]   [gj hk il]
-
-This can be thought of as the dot product of each row placed into the correct row as the only column.
diff --git a/MatrixMultiplication.md b/MatrixMultiplication.md
@@ -1,17 +0,0 @@
-# Matrix Multiplication
-
-Khan U2
-
-## Notes
-
-**Definition:** The product of A and B is defined as AB where each column of AB is Axb_n where n is the number of the column.
-
-Idea:
-
-AB = [Ab_1 Ab_2 ... Ab_n]
-
-Note: To multiply two matricies the number of columns in the first matrix must be equal to the number of rows in the second matrix.
-
-AB is not equal to BA (in pretty much all cases). Often this is not even defined.
-
-See [[VectorMatrixMultipication.md]] for information about vector and matrix products.
diff --git a/MaxNormRegularization.md b/MaxNormRegularization.md
@@ -1,7 +0,0 @@
-# Max-Norm Regularization
-
-ML P612
-
-## Notes
-
-**Definition:** Max-norm regularization is a regularization technique for neural networks that limits the combination (euclidean norm) of all incoming weights to a predefined range. If a step goes beyond this the weights are scaled accordingly to ensure compliance. 
diff --git a/MaxPooling.md b/MaxPooling.md
@@ -1,9 +0,0 @@
-# Max Pooling
-
-ML SS
-
-## Notes
-
-**Definition:** Max pooling is a processing technique whereby a pool size is selected (2x2 as an example) and the values in the pool are compressed into one value.
-
-This is a technique to both reduce computational complexity of a model and to extract higher level features.
diff --git a/Memory.md b/Memory.md
@@ -1,17 +0,0 @@
-# Memory
-
-Memory information from computer architecture course
-
-## Notes
-
-Memory performance can affect compute speed of multiple applications running concurrently. This results in poorer performance for one despite having the clocks needed to computer correctly (denial of memory). Using nice does not change this which is the priority system for OSes. This is being caused by the DRAM memory controller being shared and thus causing a bottleneck. 
-
-## Links
-
-[[DRAM.md]]
-[[DRAMChips.md]]
-[[DRAMCell.md]]
-[[RowBuffer.md]]
-[[DRAMBanks.md]]
-[[DRAMRefresh.md]]
-[[DisturbanceErrors.md]]
diff --git a/MemoryManagement.md b/MemoryManagement.md
@@ -1,44 +0,0 @@
-# Memory Management
-
-Memory management CS 202 ~W10 C++
-
-## Notes
-
-Memory management in C++ is done using a few keywords shown below
-
-**delete:** The delete keyword deallocates the memory associated with an object on the heap. 
-
-**new:** The new keyword specifies to create an object on the heap. This will not be deallocated when no longer referenced so it needs to be deleted when the time comes. 
-
-\*: The asterisk is used to create a pointer. The pointer will be of a type and can be assigned to a variable.
-
-**&:** The ampersand, known as the address operator, is used to take the address of a variable. An example of this is as follows where we assign a pointer to point to the location of an integer:
-
-```cpp
-
-	int var{3000};
-	int *ptr;
-	ptr = &var;
-
-```
-
-
-**Referencing:** This is done by the & (see above as this is quite simple).
-
-
-**Dereferencing:** This is done by the * character. This gives access to the underlying values. This can be used to both assign the underlying variable(s) and also to assign other things to them. Below is an example:
-
-```cpp
-	
-	int x{100};
-	int * ptr = &x;
-	cout << *ptr;
-	//reassigns x to 1000.
-	*ptr = 1000;
-	
-	//these print the same value, 1000
-	cout << *ptr;
-	cout << x;
-```
-
-A few cases of memory management in action are [[SinglyLinkedList.md]] and [[DoublyLinkedList.md]] which both require memory management to ensure nodes in the heap are not lost after removing a node from the list. 
diff --git a/MergeSort.md b/MergeSort.md
@@ -1,46 +0,0 @@
-# Merge Sort
-
-CLRS 2.3
-
-## Notes
-
-**Definition:** Merge sort is an algoritmh that uses [[DivideAndConquer.md]] to sort a list in log linear (n log(n)) time.
-
-Sample Implementation:
-
-```python3
-
-def merge(ls):
-
-    # Base case length of 1
-    if len(ls) == 1:
-        return ls
-    
-    # Split list in half
-    pivot = int(len(ls) / 2)
-    left = ls[0 : pivot]
-    right = ls[pivot :]
-
-    # Recurse
-    left = merge(left)
-    right = merge(right)
-
-    leftPos = 0
-    rightPos = 0
-    sorted = []
-
-    # Merge lists (O(n))
-    while leftPos < len(left) and rightPos < len(right):
-        if left[leftPos] < right[rightPos]:
-            sorted.append(left[leftPos])
-            leftPos += 1
-        else:
-            sorted.append(right[rightPos])
-            rightPos += 1
-    
-    sorted.extend(left[leftPos:])
-    sorted.extend(right[rightPos:])
-    
-    return sorted
-
-```
diff --git a/MersennePrime.md b/MersennePrime.md
@@ -1,9 +0,0 @@
-# Mersenne Prime
-
-U 2.4
-
-## Notes
-
-**Definition:** A mersenne prime is a prime number of the form (2^n) - 1. 
-
-The largest prime numbers found have been prime numbers of this form.
diff --git a/Mesh.md b/Mesh.md
@@ -1,18 +0,0 @@
-# Mesh
-
-CS 331 W11 L2
-
-## Notes
-
-**Definition:** A mesh is a representational grid of an object's surface used in [[SurfaceRepresentation.md]]
-
-Think of a fishing net. We have straight lines that subdivide the point by calculating regular intervals and exact points at those intervals. This gives the illusion of continuous surfaces, but is actually a discrete set of points. 
-
-See [[Triangulation.md]] for implementation details.
-
-### IN BLENDER
-
-There are 3 types of mesh members:
-1. Vertex : Single Point : 0 Dimensional
-2. Edges : A Line Segment Joining Two Vertices : 1 Dimensional
-3. Face : A 2D Polygon Whose Boundary is Made of Edges: 2 Dimensional
diff --git a/MeshFilter.md b/MeshFilter.md
@@ -1,7 +0,0 @@
-# Mesh Filter
-
-[[Unity]] game engine component 
-
-## Notes
-
-The mesh filter sets the shape of an object. Without a renderer, this does nothing, but this gives the general dimensions of the object (not scale though).
diff --git a/MeshRenderer.md b/MeshRenderer.md
@@ -1,7 +0,0 @@
-# Mesh Renderer
-
-[[Unity]] Component. 
-
-## Notes
-
-A mesh renderer is the component that assigns a material to an object. This does not have shape just material. The default is the magenta material. 
diff --git a/MicroArchitecture.md b/MicroArchitecture.md
@@ -1,15 +0,0 @@
-# Micro Architecture
-
-Computer Architecture L2
-
-## Notes
-
-**Definition:** The implementation of an agreed upon ISA. These are the underlying mechanics that are not exposed to the OS/System developer.
-
-There are many micro architecture implementations of each ISA, but very few different ISAs because changes to ISAs breaks compatibility. 
-
-This is anything in hardware not exposed to software. This includes speculative execution (preloading data), [[SuperScalar.md]], and [[OutOfOrderExecution.md]].
-
-Most of the time [[Cache.md]] is not exposed to the programmer, but sometimes these things are. 
-
-Microarchitecture can also set core frequency, but this is sometimes in the [[ISA.md]] which means it is not always. 
diff --git a/Microcontroller.md b/Microcontroller.md
@@ -1,9 +0,0 @@
-# Microcontroller
-
-W2
-
-## Notes
-
-**Definition:** A microcontroller consists of a cpu, integrated memory, and the ability to use external memory.
-
-A microcontroller is a fully functional computer whereas a [Microprocessor](Microprocessor.md) is simply the CPU.
diff --git a/Microprocessor.md b/Microprocessor.md
@@ -1,9 +0,0 @@
-# Microprocessor
-
-W2
-
-## Notes
-
-**Definition:** A microprocessor is simply a processor by itself.
-
-This does not include memory or anything else with it.
diff --git a/MinMaxScaling.md b/MinMaxScaling.md
@@ -1,29 +0,0 @@
-# Min-max scaling 
-
-ML CH2
-
-## Notes
-
-**Definition:** Min-max scaling also referred to as normalization is a shift from the current values to between two arbitrary values. 
-
-These two bounds are normally either 0 and 1 or -1 and 1. It is optimal for neural networks to have zero mean inputs so a range from -1 to 1 is generally good.
-
-This is often done by subtracting the min value and then dividing by the difference between the min and the max. 
-
-Here is an example implementation:
-
-```python
-
-# For each column (assuming they are numbers) iterate through them and set all
-# features to be equal to the (current - min) / diff. 
-# This has a lower bound of -1 and upper bound of 1.
-
-for i in df:
-    min = df[i].min()
-    diff = df[i].max() - min
-    df[i] = (df[i] - min) / diff 
-
-df.describe()
-```
-
-See [[FeatureScaling.md]] for more.
diff --git a/MinusOneTrick.md b/MinusOneTrick.md
@@ -1,9 +0,0 @@
-# Minus One Trick
-
-Ch 2.2
-
-## Notes
-
-**Definition:** The minus one trick is a method used to find general solutions to a system of equations by making a rectangular matrix a square matrix and adding -1 into each position along the diagnal that is not 1.
-
-By doing this we can then simply read out the general solution to the matrix instead of having to derive it.
diff --git a/MixedGraph.md b/MixedGraph.md
@@ -1,7 +0,0 @@
-# Mixed Graph
-
-10.1
-
-## Notes
-
-**Definition:** A mixed graph is a graph that allows directed and undirected edges, loops, and multi-edges.
diff --git a/MixedRandomVariable.md b/MixedRandomVariable.md
@@ -1,11 +0,0 @@
-# Mixed Random Variable 
-
-Prob L8
-
-## Notes
-
-**Definition:** A mixed random variable is a [[RandomVariables.md]] comprised of some continuous and discrete randomness. 
-
-An example is a random variable where there is a 1/2 chance of flipping a coin (discrete) to get 1 dollar and a 1/2 chance of getting a random number of dollars between 0 and 1 (continuous). This is a tree where the first split is between coin flip and random value then there is another layer where you flip the coin or get the random amount of money.
-
-These types of variables can often be combined into a [[CumulativeDensityFunction.md]] to show the probability of getting a value or less than it. 
diff --git a/Mod.md b/Mod.md
@@ -1,9 +0,0 @@
-# Mod
-
-U 2.4
-
-## Notes
-
-**Definition:** Mod is a mathematical function where we find the value 0 <= n < a such that a = bk + n for some integer b. 
-
-Generally, this is normally used only for integers, but there is not anything prohibitive about using it on R so long as b is an integer.
diff --git a/Model.md b/Model.md
@@ -1,12 +0,0 @@
-# Model
-
-RL Ch 1
-
-## Notes
-
-**Definition:** A model in RL is an agents representation of its environment that allows it to predict expected outcomes.
-
-There are two parts to the model:
-
-1. Transition Model (probabilities of switching between states)
-2. Reward Model (expected rewards after taking certain actions)
diff --git a/ModelBasedLearning.md b/ModelBasedLearning.md
@@ -1,9 +0,0 @@
-# Model Based Learning
-
-ML CH1
-
-## Notes
-
-**Definition:** Model based learning takes in inputs, does predictions, and gives an output. 
-
-This is different than [[InstanceBasedLearning.md]] because it tries to learn patterns instead of match them.
diff --git a/ModelFree.md b/ModelFree.md
@@ -1,7 +0,0 @@
-# Model Free
-
-L1
-
-## Notes
-
-**Definition:** A model free approach in RL means the agent does not know or estimate probabilities of state transitions and as such learns directly from experience.
diff --git a/Momentum.md b/Momentum.md
@@ -1,13 +0,0 @@
-# Momentum
-
-ML P580
-
-## Notes
-
-**Definition:** Momentum optimization is an optimization algorithm that uses the idea of momentum to reach an optimum faster.
-
-As we continue to have a negative gradient the optimizer moves faster and faster until it inverts where it then begins to slow down the gradient steps and subsequently change directions.
-
-The gradient is used as an acceleration factor and not as the speed.
-
-This is always faster than gradient descent and can also have a friction factor to reduce overshooting the target. 
diff --git a/MonoBehaviour.md b/MonoBehaviour.md
@@ -1,13 +0,0 @@
-# Mono Behaviour
-
-CS 331 W12 L3
-
-## Notes
-
-**Definition:** Monobehaviour is the default inherited class for scripts which contains start and update. 
-
-Each script contains code for a class that inherits from monobehaviour. 
-
-The update function is used to control the [[GameLoop.md]]
-
-The start function is called a singular time when the object is instantiated. 
diff --git a/MonotonicFunction.md b/MonotonicFunction.md
@@ -1,9 +0,0 @@
-# Monotonic Functions 
-
-Stats
-
-## Notes
-
-**Definition:** A monotonically increasing function is one where as the input increases the output either stays the same or increases. The inverse is also true with a monotonically decreasing function. The statement of monotonicity simply means always increasing or decreasing.
-
-Another variant upon monotonic functions is the strictly function which is always moving in some direction and never stagnates. An example is f(x) = x where the function is strictly increasing for all reals.
diff --git a/MonteCarloLearning.md b/MonteCarloLearning.md
@@ -1,11 +0,0 @@
-# Monte Carlo Learning
-
-L4
-
-## Notes
-
-**Definition:** Monte Carlo learning is a learning method that uses episodes and averages their returns to optimize policies.
-
-First Visit - First visit Monte Carlo learning we only increment the counter for the current state if it is the first visit to that state in the given episode.
-
-Every Visit - Every visit Monte Carlo learning increments the counter for the current state every time the state is visited.
diff --git a/MonteCarloMethod.md b/MonteCarloMethod.md
@@ -1,11 +0,0 @@
-# Monte Carlo Method
-
-SS
-
-## Notes
-
-**Definition:** The monte carlo method is a class of algorithms that use repeated random sampling to converge upon a solution to a problem where there may be a true solution, but are too complex to analyze.
-
-An example of this is the calculation of PI using random sampling of a 2D grid to find the approximate area of a circle with a radius of 1.
-
-Another example of this could be any arbitrary volume/surface area calculation where we don't know the exact formula for the true calculation.
diff --git a/MooresLaw.md b/MooresLaw.md
@@ -1,11 +0,0 @@
-# Moore's Law
-
-Computer architecture L2.
-
-## Notes
-
-**Definition:**Component counts double every other year.
-
-This was found by examining a log base 2 function to find that the function was linear and as such the underlying function is exponential (x^2). 
-
-He also found that the relative cost was decreasing, but the higher end was consistently expensive or increasingly so. 
diff --git a/MosaicPlot.md b/MosaicPlot.md
@@ -1,7 +0,0 @@
-# Mosaic Plot
-
-Stats D4
-
-## Notes
-
-**Definition:** A mosaic plot is a plot that shows cross tabulated information in a graphical way where each box is sized according to the actual values of the classes associated with the given position.
diff --git a/Movement.md b/Movement.md
@@ -1,33 +0,0 @@
-# Movement
-
-CS 331 W12 L1
-
-## Notes
-
-There are many different ways to implement movement.
-
-A common way to move is by doing this:
-
-```csharp
-
-float speed = 2.0f; //assign default movement speed
-float rotationRate = 30.0f; //assign default rate of rotation
-
-bool moveForward = Input.GetKey("up"); 
-if (moveForward){
-	transform.position += transform.forward * speed * Time.deltaTime;
-} 
-
-bool rotateLeft = Input.GetKey("left"); //this assumes rotation is done by keyboard input
-if(rotateLeft){
-	transform.Rotate(0, -rotationRate * Time.deltaTime, 0);
-} 
-```
-The issue with this is movement may not feel natural because there is no acceleration being applied to the object you are just moving it by a certain amount. In essence, you are assigning a velocity to the object for the frames where the "up" key is pressed.
-
-
-See [[Input.md]] for more information about the Input class. 
-
-See [[Vector3.md]] for more information about positions.
-
-See [[Quaternions.md]] for more about rotation/angles
diff --git a/MultiValuedFunction.md b/MultiValuedFunction.md
@@ -1,9 +0,0 @@
-# Multivalued Function
-
-Ch 0
-
-## Notes
-
-**Definition:** Multivalued functions are functions such that there exists two or more values in the codomain for at least one value in the domain. 
-
-These are not strictly functions, unless specified, as functions must map to only one value in the codomain for each value in the domain.
diff --git a/MulticlassClassifier.md b/MulticlassClassifier.md
@@ -1,7 +0,0 @@
-# Multiclass Classifier
-
-ML D2
-
-## Notes
-
-**Definition:** A multiclass classifier is a classifier that classifies items into more than two classes (not binary classification).
diff --git a/Multigraph.md b/Multigraph.md
@@ -1,7 +0,0 @@
-# Multi-Graph
-
-Ch 4
-
-## Notes
-
-**Definition:** A multi-graph is a graph that can contain multiple edges to the same node.
diff --git a/MultilabelClassification.md b/MultilabelClassification.md
@@ -1,9 +0,0 @@
-# Multilabel Classification
-
-ML D2
-
-## Notes
-
-**Definition:** Multilabel classification is classification where there may be multiple binary outputs that are true.
-
-An example of this would be an human recognition model. Let's say we want to know if bob, jim, or mary are in an image. If bob and jim are in the image the model should then return [true, true, false] or some sort of understandable output to denote such information.
diff --git a/MultinomialCoefficient.md b/MultinomialCoefficient.md
@@ -1,14 +0,0 @@
-# Multinomial Coefficient
-
-Ch 1.3
-
-## Notes
-
-**Definition:** A multinomial coefficient is a form of binomial coefficient where the bottom of the binomial coefficient is multiple numbers.
-
-Calculating the multinomial is quite simple. Assume we have 3,3,4 on the bottom and 10 on the top. We can find the multinomial coefficient by finding the following:
-
-( 10  ) = (10!)/(3!3!4!) = 4200
-(3,3,4)   
-
-The above means that there are 4200 possible permutations that are of length 10 where we have 3 of one group, 3 of another group, and 4 of the final group.
diff --git a/MultioutputClassification.md b/MultioutputClassification.md
@@ -1,7 +0,0 @@
-# Multioutput Classification
-
-ML D2
-
-## Notes
-
-**Definition:** Multioutput classification is a type of multilabel classification where each output can be multiple classes. 
diff --git a/Multiset.md b/Multiset.md
@@ -1,23 +0,0 @@
-# Multiset
-
-U2.2.5
-
-## Notes
-
-**Definition:** A multiset is an unordered collection that can contain multiple instances of the same object.
-
-Multiset is short for multiple-membership set.
-
-Example:
-
-Multiset:
-
-{a,a,a,b,b}
-
-Preferred notation:
-
-{3 x a , 2 x b}
-
-The number out front that denotes the number of instances is called the multiplicity of the element. As such, a multiplicity of 0 means the element is not contained in the set.
-
-When unioning two multisets we take the larger of the multiplicities for shared elements. Conversely, the intersection between two multisets is the minimal multiplicity value.
diff --git a/MutuallyIndependent.md b/MutuallyIndependent.md
@@ -1,7 +0,0 @@
-# Mutually Independent
-
-Ch 1.4
-
-## Notes
-
-**Definition:** A set of mutually independent events is a set such that all conditional probabilities (any combination) are equivalent to the unconditioned probabilities.
diff --git a/NAG.md b/NAG.md
@@ -1,7 +0,0 @@
-# NAG (Nesterov Accelerated Gradient (optimization)) 
-
-ML P582
-
-## Notes
-
-**Definition:** NAG is an improvment upon the momentum optimization algorithm where instead of finding the gradient of the current position and adding this to the velocity, we instead find the gradient slightly ahead (in direction of momentum) and then add this factor to the velocity.
diff --git a/NLP.md b/NLP.md
@@ -1,7 +0,0 @@
-# NLP
-
-ML Book CH1
-
-## Notes
-
-**Definition:** NLP is the acronym for natural language processing. This is the process of taking in language data (written, audible, or some other form), and doing something with it. This may be classification or something else.
diff --git a/NLU.md b/NLU.md
@@ -1,7 +0,0 @@
-# NLU
-
-ML CH1
-
-## Natural Language Understanding
-
-**Definition:** NLU is an application of ML where the model must in some way interpret input language data.
diff --git a/NPComplete.md b/NPComplete.md
@@ -1,7 +0,0 @@
-# NP Complete Problem 
-
-U 2.3
-
-## Notes
-
-**Definition:** NP complete problems are a set of problems of the NP family such that if any of them are found to be solvable in polynomial time then P=NP.
diff --git a/NPProblem.md b/NPProblem.md
@@ -1,7 +0,0 @@
-# NP Problem
-
-U 2.3
-
-## Notes
-
-**Definition:** An NP problem (non-deterministic polynomial) is a problem that can be verified in polynomial time but is not (believed to be) solvable in polynomial time.
diff --git a/NaiveBayes.md b/NaiveBayes.md
@@ -1,16 +0,0 @@
-# Naive Bayes
-
-ML SS
-
-## Notes
-
-**Definition:** Naive Bayes is an algorithm used to find the probabilities of text being part of a given class. 
-
-This is often used for spam classification. Here are the steps:
-
-1. Find percent of classification messages that contain each token (word/phrase)
-2. Using this percent, multiply all percents together for each token in a given message.
-3. Multiply this final percent with a known probability of any given item being part of the current class.
-4. Find the class with the highest percent and assume it is of that class.
-
-Often for this we want to add a pseudo-count to each token count for the class. This ensures that if a class has none of a given token the output is not 0% instead it would simply be lower. 
diff --git a/NaryOperations.md b/NaryOperations.md
@@ -1,7 +0,0 @@
-# N-ary Operations
-
-SS
-
-## Notes
-
-**Definition:** N-ary operations is a general term for operations that take a finite and specific number of inputs, but don't fall into the category of unary, binary, ternary, or in some cases quaternary.
diff --git a/NaturalLog.md b/NaturalLog.md
@@ -1,29 +0,0 @@
-# Natural Log
-
-## Notes
-
-**Definition:** The natural log (ln) is an expression stating the output of the logarithm function is the value such that e^x is equal to the value being taken as the input.
-
-
-When working with ln we have the following options for algebraic manipulations:
-
-1. Division becomes subtraction
-	- ln(x/y) -> ln(x) - ln(y)
-2. Multiplication become addition
-	- ln(xy) -> ln(x) + ln(y)
-3. Exponents can be pulled out
-	- ln(x^2) -> 2ln(x)
-
-There are no other rules for manipulations so there are times when we are limited in our reduction of equations such as when we are adding together values in the logarithm as this will then not allow us to pull out any of our exponents.
-
-d/dx ln(|x|) = 1/x
-
-The |x| is only neccesary for functions that cross into the negative x values.
-
-### Chain Rule
-
-We can also use the chain rule for ln. 
-
-Example:
-
-$\frac{d}{dx} (ln(g(x))) = \frac{1}{g(x)} \cdot g'(x)$
diff --git a/Negation.md b/Negation.md
@@ -1,7 +0,0 @@
-# Negation
-
-1.1.1
-
-## Notes
-
-**Definition:** Negation is the process of inverting the truthiness of a proposition.
diff --git a/NestedQuantifier.md b/NestedQuantifier.md
@@ -1,11 +0,0 @@
-# Nested Quantifiers
-
-U 1.5.1
-
-## Notes
-
-**Definition:** Nested quantifiers are when there are multiple quantifiers in the same scope.
-
-Example:
-
-$\forall x \exists y (x+y=0)$
diff --git a/NeuralNetworks.md b/NeuralNetworks.md
@@ -1,27 +0,0 @@
-# (Artificial) Neural Networks (ANNs)
-
-ML D5
-
-## Notes
-
-**Definition:** Artificial neural networks are machine learning models that mimick biological neurons to complete some task.
-
-ReLU activations can be used on output layers to force the output to be positive. Additionally, we can use softplus which is relu but smooth to set output values because by default there is not an activation function for the output layer.
-
-### Hidden Layer Count Selection
-
-Deeper neural networks have better parameter efficiency. This means you need less neurons to model complex functions when compared with shallower NNs.
-
-### Neuron Count Per Layer 
-
-It is common for all layers to be the same in most cases. There are however times when we make them a pyramid shape, descending, because each layer picks out different information that coalesces into higher level information. Another common approach is to make the first hidden layuer large and then all subsequent ones the same size (smaller). 
-
-In most cases, having all layers the same size is equally as accurate as a pyramid structure and reduces the number of hyperparameters to tune which is a good thing.
-
-Basically, normally they should all be the same size. Sometimes first hidden is bigger and the rest are same size smaller. Sometimes make a pyramid, but this increases the number of hyperparams.
-
-### Count Info (Combined # of layers and neurons per layer)
-
-Sometimes we use a stretch pants method to prevent overfitting. We do this by selecting a bigger model than needed and then using early stopping to prevent overfitting. 
-
-Generally, increasing the number of layers is better than increasing the number of neurons.
diff --git a/NonDeterministicFiniteAutomata.md b/NonDeterministicFiniteAutomata.md
@@ -1,17 +0,0 @@
-# Non-deterministic Finite Automata (NFA)
-
-**Source:** Theory of Computation
-
-**Lecture:** 3
-
-## Notes
-
-**Definition:** An NFA is a machine that may have several choices for the next state at any point. This is to say some edges may have multiple labels.
-
-NOTE:
-
-Non-determinism is a generalization of determinism so every DFA is a NFA, but not necessarily the other direction, hence the statement above 'some edges may have'.
-
-Additionally, it is also possible for a NFA to have a state transition with a label of epsilon, indicating such a transition has no impact upon the current word.
-
-When evaluating a NFA, if the machine runs into a split where it can either take one option or the other option, it creates a split (copy) and runs both in parallel. If the next input symbol does not appear on any of the arrows exiting the state occupied by the copy, the copy dies.
diff --git a/NormalDistribution.md b/NormalDistribution.md
@@ -1,9 +0,0 @@
-# Normal Distribution
-
-Stats D1 + Prob L8
-
-## Notes
-
-**Definition:** A normal distribution is a unimodal one in which most observations cluster around the mound while fewer and fewer observations are farther away. 
-
-With normal distributions we often refer to them in regard to the standard normal distribution which is the normal distribution defined as the distribution centerd about 0 with a std deviation of 1. This is conveninent to project other graphs onto given that normal distributions don't have a percentile calculation in the closed form thus we use lookup tables.
diff --git a/NormalVector.md b/NormalVector.md
@@ -1,7 +0,0 @@
-# Normal Vector
-
-Khan
-
-## Notes
-
-**Definition:** The normal vector of a hyperplane is a vector that is orthogonal to the hyperplane (there are infinitely many as this is simply a direction and the magnitude does not matter unless specifying unit normal vector).
diff --git a/NoveltyDetection.md b/NoveltyDetection.md
@@ -1,9 +0,0 @@
-# Novelty Detection
-
-ML CH1
-
-## Notes
-
-**Definition:** Novelty detection is used to detect new samples that appear different from other instances in the training set.
-
-This is similar to [[AnomalyDetection.md]].
diff --git a/NullSpace.md b/NullSpace.md
@@ -1,17 +0,0 @@
-# Null Space
-
-Khan
-
-## Notes
-
-**Definition:** The null space of matrix A is the set of vectors {$\vec{b} \in \R^n | \space \vec{b} \cdot A=\vec{0}$}.
-
-These are all of the vectors that when multiplied by the matrix are equivalent to the zero vector. This is a closed ([[Closure.md]]) [[Subspace.md]].
-
-To calculate the null space do the following:
-
-1. Get [[ReducedRowEchelonForm.md]]
-2. Write out find the values of each pivot entry (relation to other values)
-3. Plug this into vectors of height n where each vector is multiplied by the corresponding axis component
-
-The null set of a linearly independent set is always just the zero vector. This is an iff situation.
diff --git a/Nullity.md b/Nullity.md
@@ -1,9 +0,0 @@
-# Nullity
-
-Khan
-
-## Notes
-
-**Definition:** The nullity of a matrix is the dimensionallity of its [[NullSpace.md]].
-
-The nullity of a matrix is equal to the number of non-pivot (free) variable columns.
diff --git a/NumberTheory.md b/NumberTheory.md
@@ -1,7 +0,0 @@
-# Number Theory
-
-U 2.4
-
-## Notes
-
-**Definition:** Number theory is a branch of mathematics that concerns itself with properties and functions on integers.
diff --git a/OffPolicyLearning.md b/OffPolicyLearning.md
@@ -1,7 +0,0 @@
-# Off Policy Learning
-
-L5
-
-## Notes
-
-**Definition:** Off policy learning can be thought of as looking over someone else's shoulder to understand what will and will not result in high rewards.
diff --git a/OfflineLearning.md b/OfflineLearning.md
@@ -1,10 +0,0 @@
-
-ML CH1
-
-## Notes
-
-**Definition:** Offline learning is the process of learning and then implementing the learned behavior where the model can not learn incrementally. This is also referred to as batch learning.
-
-Think of alphago. It was trained to play go, then the agent was sent out to enact the policy not to learn more when playing real people.
-
-When models become obselete due to inability to learn new information this is called model rot or data drift. This happens because data always changes but the model can't. The solution to this is either using an [[OnlineLearning.md]] model or retraining.
diff --git a/OnPolicyLearning.md b/OnPolicyLearning.md
@@ -1,9 +0,0 @@
-# On Policy Learning
-
-L5
-
-## Notes
-
-**Definition:** On policy learning is learning by following the policy.
-
-We sample actions from the policy whilst evaluating the policy.
diff --git a/OneHotEncoding.md b/OneHotEncoding.md
@@ -1,11 +0,0 @@
-# One-hot Encoding
-
-ML CH2
-
-## Notes
-
-**Definition:** One hot encoding is the process of taking all unique features of a given feature and expanding these out to be individual boolean attributes of a sample. 
-
-An example of this is if you have a column that states the distance from the ocean. The options are island, 1 hour, and near ocean. These could be encoded as integers, but the issue is that these value are not representative of what the values mean thus mapping this to a linear regression would cause issues because higher or lower does not necessarily mean better. As such, you would then add 1 hour, near ocean, and island as columns and then set booleans as true or false based on the distance string. 
-
-See [[LabelEncoding.md]] for a simple way of encoding strings as numbers. This is useful when there are lots of options and the model knows the data is arbitrarily numbered.
diff --git a/OneVersusAll.md b/OneVersusAll.md
@@ -1,11 +0,0 @@
-# One Versus All (OvA) or One Versus Rest (OvR)
-
-ML D2
-
-## Notes
-
-**Definition:** One versus all classifiers are a sequence of binary classifiers that output probabilities where the highest probability is then selected as the output. 
-
-Think of this as a series of SVC or SGD classifiers that output some likelihood that the current input is part of a particular class. You then send the input into each model and whichever one outputs the highest probability is the class that the input belongs to. 
-
-See also [[OneVersusOne.md]] for another strategy to put together models to do classification.
diff --git a/OneVersusOne.md b/OneVersusOne.md
@@ -1,13 +0,0 @@
-# One Versus One (OvO)
-
-ML D2
-
-## Notes
-
-**Definition:** A one versus one classification strategy trains binary classifiers to output the probability of an input being part of one class or another. 
-
-Basically, you train a model to compare between one set and another. It outputs the probability of one output over the other. Then by doing these comparisons whichever class wins with the most classifiers the input is part of that class(in theory).
-
-As such, one must train N * (N-1)/2 classifiers which can be a lot depending on how many classes there are. In the case of 0-9 (mnist) this comes out to 45 models. On the flip side, given how the model works, each model does not need to be trained on the entire set only the subset containing the classes being compared. 
-
-See also [[OneVersusAll.md]] for another strategy regarding classification based on binary classifier chaining. The main reason OvO can be better than OvA is because some models are slow to train on larger datasets thus only training models on a subset, albeit training more models, can be faster. This is especially true for support vector machine classification models. In most cases however OvA is preferred.
diff --git a/OnesComplement.md b/OnesComplement.md
@@ -1,8 +0,0 @@
-
-Self Study
-
-## Notes
-
-**Definition:** One's complement is an implementation of signed values such that a 1 in the MSB position indicates the number is negative.
-
-This is not used today because it required extra computation for mathematical operations, and it has both a positive and a negative zero which is a waste.
diff --git a/OnlineLearning.md b/OnlineLearning.md
@@ -1,14 +0,0 @@
-
-ML CH1
-
-## Notes
-
-**Definition:** Online learning is the process of learning as a model is fed new data.
-
-This paradigm is in contrast with [[OfflineLearning.md]] also known as batch learning where all data is trained on at the start and then the learned behavior is acted upon in a static way in perpetuity. 
-
-When using online learning, you can use either individual samples to train on or mini-batches which are groupings of samples.
-
-This method can be used to train models on data where not all of the training data can fit in the machine's memory. This is referred to as out-of-core learning. When doing this, the algorithm loads in a mini-batch, runs a training step, and then repeats for all of the dataset. This may be confusing as the learning is done offline, but considering online learning more as incremental learning can help resolve this thought issue.
-
-The rate at which these models adapt to new information is called the [[LearningRate.md]]. When this is high they respond quickly to new data at the cost of losing old data faster. It is a balancing game. Counter to this, with a low learning rate we have more 'inertia' from old data in the set.
diff --git a/Opcode.md b/Opcode.md
@@ -1,9 +0,0 @@
-# Opcode
-
-CA L3
-
-## Notes
-
-**Definition:** An opcode is the first part of an [[Instruction.md]] which describes what the instruction does. 
-
-This is a form of [[BitSteering.md]]
diff --git a/OpenAddressing.md b/OpenAddressing.md
@@ -1,7 +0,0 @@
-# Open Addressing (hashing)
-
-L4
-
-## Notes
-
-**Definition:** Open addressing is the process of resolving collisions by probing for the next available location in a predefined manor to remove the need to resolve collisions with another data structure.
diff --git a/Operands.md b/Operands.md
@@ -1,9 +0,0 @@
-# Operands
-
-CA L3
-
-## Notes
-
-**Definition:** Operands describe who an [[Instruction.md]] should be done to. 
-
-See [[Opcode.md]] for the other part of an instruction. 
diff --git a/OperatorNotation.md b/OperatorNotation.md
@@ -1,7 +0,0 @@
-# Operator Notation
-
-Ch 0
-
-## Notes
-
-**Definition:** Operator notation is a way to define tasks in a way that uses complex operators such as x+y to define the addition of the ordered pair (x,y).
diff --git a/OptimalBayesianAgent.md b/OptimalBayesianAgent.md
@@ -1,7 +0,0 @@
-# Optimal Bayesian Agent
-
-Superintelligence - Bostrom
-
-## Notes
-
-**Definition:** An optimal bayesian agent is an agent that at all times takes the best possible action based on probabilities and expected values to maximize some utility/cost function.
diff --git a/OptimalSubstructure.md b/OptimalSubstructure.md
@@ -1,7 +0,0 @@
-# Optimal Substructure
-
-L3
-
-## Notes
-
-**Definition:** Optimal substructure is a property of problems such that an overall (optimal) solution to the problem can be derived by finding out something about subproblems.
diff --git a/Optimizer.md b/Optimizer.md
@@ -1,14 +0,0 @@
-# Optimizer
-
-ML P580
-
-## Notes
-
-**Definition:** An optimizer is an algorithm to adjust the weights and biases of neural networks.
-
-Here are a list of common optimizers:
-
-[[Momentum.md]] - Gradient is acceleration
-[[NAG.md]] - Calculates momentum slightly ahead of current position
-[[AdaGrad.md]] - Good for simple quadratic problems
-[[Adam.md]] - Generally the best
diff --git a/OracleComputer.md b/OracleComputer.md
@@ -1,9 +0,0 @@
-# Oracle Computer (machine)
-
-SS
-
-## Notes
-
-**Definition:** An oracle computer is a computer that can compute any computable problem. 
-
-Such a system does not need to be possible see [BekensteinBound](BekensteinBound.md) for why it may not be possible.
diff --git a/OrderedSample.md b/OrderedSample.md
@@ -1,7 +0,0 @@
-# Ordered Sample
-
-CH 1.3
-
-## Notes
-
-**Definition:** An ordered sample is an outcome where the order of elements contributes to the uniqueness of the output. As such, an ordered sample is denoted using ordered pairs instead of a set as sets are innately unordered.
diff --git a/OrdinaryLeastSquares.md b/OrdinaryLeastSquares.md
@@ -1,9 +0,0 @@
-# Ordinary Least Squares (OLS)
-
-ML CH2
-
-## Notes
-
-**Definition:** Ordinary least squares is a formula used to find the statistical line of best fit for some dataset where we are trying to minimize the square error. 
-
-When doing [[LinearRegression.md]] there are two common methods to find the line. One is OLS and the other is [[GradientDescent.md]]. 
diff --git a/OrthogonalComplement.md b/OrthogonalComplement.md
@@ -1,21 +0,0 @@
-# Orthogonal Complement
-
-Khan U3
-
-## Notes
-
-**Definition:** The orthogonal complement of a subspace is the subspace such that the dot product between any vectors (one from each subspace) are 0.
-
-The orthogonal complement of the subspace V in $\R^n$ is defined as follows: 
-
-$V^\perp = \{\vec{x} \in \R^n  | \vec{x} \cdot \vec{v} = 0 \text{ and } \vec{v} \in V \}$
-
-The orthogonal complement of a subspace is a subspace in all cases as it respects scalar multiplication, vector addition, and contains the zero vector.
-
-Every element of the nullspace is in the orthogonal complement and vice versa thus they are the same set.
-
-## Dimensionality
-
-For the arbitrary subspace V, we know dim(V) = k. As such, we also know for O which is the orthogonal complement, that dim(O) = k - n where R^n is the [[AmbientSpace.md]].
-
-This is given because we also know that the [Nullity](Nullity.md) + [Rank](Rank.md) = dim([AmbientSpace](AmbientSpace.md)).
diff --git a/Orthonormal.md b/Orthonormal.md
@@ -1,9 +0,0 @@
-# Orthonormal bases 
-
-U3
-
-## Notes
-
-**Definition:** An orthonormal set is a set of linearly independent vectors that have been normalized (length = 1).
-
-The ortho part means that all vectors are orthogonal to each other.
diff --git a/OutOfBag.md b/OutOfBag.md
@@ -1,22 +0,0 @@
-# Out of Bag (OOB)
-
-ML D5
-
-## Notes
-
-**Definition:** Out of bag refers to samples that are not contained within a training sampling for a given predictor when using bagging/pasting.
-
-It is 37% likely that when using bagging and selecting m random samples from the training set that a given sample will be out of bag. These can be useful because these values can then be used for validation of the individual predictor.
-
-Here is an example implementation of oob scoring used on a decision tree classifier with scikit learn:
-
-```python3
-
-# Train and then validate predictors on their out of bag samples.
-
-bag_clf = BaggingClassifier(DecisionTreeClassifier(), n_estimators=500, oob_score=True, n_jobs=-1, random_state=10)
-
-bag_clf.fit(X_train, y_train)
-bag_clf.oob_score_
-
-```
diff --git a/OutOfOrderExecution.md b/OutOfOrderExecution.md
@@ -1,9 +0,0 @@
-# Out of Order Execution
-
-Computer Architecture L2
-
-## Notes
-
-**Definition:** An optimization strategy that executes commands out of order to reduce the amount of clocks/time taken to complete computations. This is complex as it can be hard to determine if a command relies upon another command that came in earlier.  
-
-See [[DataFlow.md]] for more information about out of order/non-Von Neumann computation.
diff --git a/Overfitting.md b/Overfitting.md
@@ -1,15 +0,0 @@
-# Overfitting
-
-ML CH1
-
-## Notes
-
-**Definition:** Overfitting is when a model is trained on data and performs well on it but lacks the ability to generalize. 
-
-Generally, this is caused by having a complex model with lots of features but not enough training samples or training samples that have too much noise. This issue can be resolved by simplifying the model (decrease features), removing noise from the samples, or increasing the number of samples.
-
-When reducing the risk of overfitting by simplifying a model we call this regularization. Doing this we can either remove features or limit the one or more degrees of freedom of the model. Let's assume we are doing linear regression, we can limit the m value (mx+b) to be within a certain range so while the model has two degrees of freedom still, it is simpler and thus, in some cases, more generalizable depending on the training samples and the inputs being inferenced upon. 
-
-Overfitting can be seen when you train on training data and find that the test set values have a high [[GeneralizationError.md]] meaning thatn the model is unable to generalize.
-
-Overfitting can be easily thought about as making your model too good at the training data which limits its ability to generalize.
diff --git a/OverlappingSubproblems.md b/OverlappingSubproblems.md
@@ -1,7 +0,0 @@
-# Overlapping Subproblems
-
-L3
-
-## Notes
-
-**Definition:** Overlapping subproblems is a property of a problem such that subproblems occur again and again meaning we are being more efficient by solving these subproblems than by trying to solve the original problem.
diff --git a/Oversmooothing.md b/Oversmooothing.md
@@ -1,9 +0,0 @@
-# Oversmoothing
-
-Stats D3
-
-## Notes
-
-**Definition:** Oversmoothing is the process of making the bandwidth of a kernel too large such that resulting visualizations smooth over important information.
-
-This can be thought of as underfitting the dataset.
diff --git a/PCA.md b/PCA.md
@@ -1,15 +0,0 @@
-# PCA (Principal Component Analysis)
-
-ML D5
-
-## Notes
-
-**Definition:** PCA is a dimensionality reduction algorithm that finds a hyperplane that lies close to the data and then projects the data onto it.
-
-The goal of this algorithm is to preserve maximum variance so values in the dataset are optimally spread out.
-
-The way to describe this as a cost function would be to minimize the mean squared distance between the original dataset and the projected position.
-
-When using PCA this compresses data and it is possible to get close to the original values. To do this using sklearn we can simply use the inverse transform. 
-
-There is also IPCA (incremental) which allows for out of core processing. Using this in concatenation with np.memmap which can load and unload np arrays from disk is useful. 
diff --git a/PProblem.md b/PProblem.md
@@ -1,7 +0,0 @@
-# P Problem
-
-U 2.3
-
-## Notes
-
-**Definition:** A P problem is a problem that can be both solved and verified in polynomial time.
diff --git a/PairwiseIndependence.md b/PairwiseIndependence.md
@@ -1,7 +0,0 @@
-# Pairwise Independent
-
-Ch 1.4
-
-## Notes
-
-**Definition:** Pairwise independent events are two events such that the conditional probabilities of either are equivalent to the unconditioned probabilities.
diff --git a/PairwiseRelativelyPrime.md b/PairwiseRelativelyPrime.md
@@ -1,7 +0,0 @@
-# Pairwise Relatively Prime
-
-U 2.4
-
-## Notes
-
-**Definition:** Pairwise relatively primes are a set of numbers such that the gcd between any two numbers in the set is always 1.
diff --git a/PartialDerivative.md b/PartialDerivative.md
@@ -1,9 +0,0 @@
-# Partial Derivative
-
-ML D2
-
-## Notes
-
-**Definition:** The partial derivative is a derivative of a multivariate function with respect to a singular variable by considering the others as constants.
-
-Often this is used in [[GradientDescent.md]] to determine in what ways parameters need to change.
diff --git a/PartiallyObservableMarkovDecisionProcess.md b/PartiallyObservableMarkovDecisionProcess.md
@@ -1,7 +0,0 @@
-# Partially Observable Markov Decision Process (POMDP)
-
-L1
-
-## Notes
-
-**Definition:** A partially observable markov decision process is a type of markov decision process where the agent doesn't have access to the entire current state.
diff --git a/PartiallyOrderedSet.md b/PartiallyOrderedSet.md
@@ -1,9 +0,0 @@
-# Partially Ordered Set (Poset)
-
-Ch 9.6
-
-## Notes
-
-**Definition:** (S,R) is a partial ordered set (poset) if the relation R is reflexive, antisymmetrice, and transitive with respect to the set S.
-
-Basically, we can define R as {(1,1), (2,2), (3,3), (3,2)} and since this is a partial ordering (reflexive, antisymmetric, and transitive), if we define S as {1,2,3} we see (S,R) is a poset.
diff --git a/ParticularSolution.md b/ParticularSolution.md
@@ -1,7 +0,0 @@
-# Particular Solution
-
-Ch 2.2
-
-## Notes
-
-**Definition:** A particular solution to a set of linear equations are specific values that make all of the equalities of the system true.
diff --git a/Partition.md b/Partition.md
@@ -1,11 +0,0 @@
-
-AM W14 Reading
-
-## Notes
-
-**Definition:** A partition of a set A is a set of non-empty subsets of A, such that the union of all the subsets equals A, and the intersection of any two different subsets is the null set. 
-
-Basically, a partition is the subsets of a set where all subsets together make the original set and all subsets are unique in their elements where any intersection between them is the null set. Keep in mind the partition is the combination of all of them not simply a singular one of the subsets which is where this diverges from the computational term "partition".
-
-This relates to [[EquivalenceClass.md]] as certain partitions are equivalence classes when considering equivalence relation sets. 
-
diff --git a/PascalsIdentity.md b/PascalsIdentity.md
@@ -1,7 +0,0 @@
-# Pascals Identity
-
-Ch 6.4
-
-## Notes
-
-**Definition:** Pascal's identity is the idea that n+1 choose r is equivalent to n choose r plus n choose r-1.
diff --git a/Pasting.md b/Pasting.md
@@ -1,7 +0,0 @@
-# Pasting
-
-ML D5
-
-## Notes
-
-**Definition:** Pasting is the process of training multiple models of the same type on subsets of a dataset. This is different than bagging as pasting removes selected samples of the current subset subset from the current predictors options. This means the same predictor (model) can't be trained on the same sample twice, but different predictors may use some of the same samples. 
diff --git a/Path.md b/Path.md
@@ -1,7 +0,0 @@
-# Path
-
-Ch 4
-
-## Notes
-
-**Definition:** A path is a sequence of adjacent nodes where nodes can not be repeated.
diff --git a/Percentile.md b/Percentile.md
@@ -1,7 +0,0 @@
-# Percentile
-
-Khan
-
-## Notes
-
-**Definition:** Percentile is the percent of data that is below the specified amount or at or below the amount. 
diff --git a/Perceptrons.md b/Perceptrons.md
@@ -1,17 +0,0 @@
-# Perceptrons
-
-ML D5
-
-## Notes
-
-**Definition:** Perceptrons are an artificial neural network architecture based on threshold logic untis (TLUs) or linear threshold units (LTUs). 
-
-The inputs and outputs of these neurons are numbers and each input is associated with a weight. 
-
-The neuron accepts inputs and computes a linear function on them. It then uses a step function to get the output.
-
-Perceptrons are a single layer neural network where the inputs are taken in and they are connected to each neuron. These neurons are then the output layer. 
-
-If there are multiple layers of perceptrons these are then called MLPs (multilayer perceptrons).
-
-MLPs are a type of deep neural networks, but deep neural networks also include things like recurrent neural networks whereas MLPs are only feed forward.
diff --git a/PerfectNumbers.md b/PerfectNumbers.md
@@ -1,13 +0,0 @@
-# Perfect Numbers
-
-Math 310
-
-## Notes
-
-**Definition:** Perfect numbers are numbers such that all divisors added up are equal to the number itself. 
-
-A few examples are 28 and 6
-
-28 = 1 + 2 + 4 + 7 + 14
-
-6 = 1 + 2 + 3
diff --git a/PeriodicChain.md b/PeriodicChain.md
@@ -1,15 +0,0 @@
-# Periodic (Markov) Chain
-
-L17
-
-## Notes
-
-**Definition:** Periodic Markov chains are a specific type of Markov chain defined as a chain with groups such that all transitions frome one group lead to the next group.
-
-Periodic Markov chains are interesting because they never achieve a steady state.
-
-Example:
-
-Imagine you are walking in circles at a constant rate. You start at location 0. The next time you are polled you are at location 1 then location 2 then 0 again. This cycle repeats ad infinium.
-
-Each transition in the above example has a probability of 1 and the current state will always n mod 2 where n is the number of steps since the intial state assuming the intial state is 0.
diff --git a/PerlinNoise.md b/PerlinNoise.md
@@ -1,11 +0,0 @@
-# Perlin Noise 
-
-SS
-
-## Notes
-
-**Definition:** Perlin noise is a procedural gradient texture generated using the perlin noise algorithm.
-
-Not 100% about this:
-
-The perlin noise algorithm creates a grid where each point on the grid is represented by a vector. We then take the dot product with the other three surrounding vectors to find the value at a given point. From here we use interpolation to fill in the rest. 
diff --git a/Permutation.md b/Permutation.md
@@ -1,13 +0,0 @@
-# Permutation
-
-CH 1.3
-
-## Notes
-
-**Definition:** A permutation is an arrangement of elements length n.
-
-To calculate the total number of permutations of a given list we simply find the length, denoted n, factorial.
-
-In stats there is also a denotation $_nP_r$ which is the number of permutations of the set length n where each permutation is of length r. This is similar to the binomial coefficient except the binomial coefficient describes subsets and thus there is no order making the number of subsets = $\frac{_nP_r}{r!}$.
-
-$_nP_r = \frac{n!}{(n-r)!}$
diff --git a/Physics.md b/Physics.md
@@ -1,11 +0,0 @@
-# Physics Links
-
-Physics related content
-
-### Classical Mechanics (MIT OCW)
-
-Takeaways:
-	- TAKEAWAYS HERE AT THE END
-
-L1
-	- [[AstronomicalUnit.md]]
diff --git a/Pictograph.md b/Pictograph.md
@@ -1,7 +0,0 @@
-# Pictograph
-
-Khan
-
-## Notes
-
-**Definition:** A picture representation of statistics such as a chart, graph, or something else.
diff --git a/PigeonholePrinciple.md b/PigeonholePrinciple.md
@@ -1,7 +0,0 @@
-# Pigeonhole Principle 
-
-Ch 6.2
-
-## Notes
-
-**Definition:** The pigeonhole principle states that if there are n pigeons and z nests, if z is smaller than n there then must be at least one z such that z contains multiple pigeons.
diff --git a/PipelineControl.md b/PipelineControl.md
@@ -1,7 +0,0 @@
-# Pipline Control
-
-CA L3
-
-## Notes
-
-**Definition:** Pipline control describes the management and coordinatei
diff --git a/Pipelining.md b/Pipelining.md
@@ -1,9 +0,0 @@
-# Pipelining
-
-CA L3
-
-## Notes
-
-**Definition:** Pipelining is the use of CPU hardware such that simultaneous execution of more than one instruction occurs at the same time. 
-
-See [[OutOfOrderExecution.md]]
diff --git a/PlaneToPlaneDistance.md b/PlaneToPlaneDistance.md
@@ -1,17 +0,0 @@
-# Plane to Plane Distance
-
-Khan
-
-## Notes
-
-See [[DistanceToPlane.md]] for distance from plane to point. 
-
-This only is useful for planes that are paralell otherwise they will intersect. 
-
-Steps:
-
-1. Find equation of both planes 
-2. Find representative point
-3. Take [[DistanceToPlane.md]] from the rep point to the other plane.
-
-This is true because all points will be the same distance from the other plane given that they are paralell. Otherwise, it would be imperative that they intercect and thus have a min distance of 0.
diff --git a/PoissonDistribution.md b/PoissonDistribution.md
@@ -1,17 +0,0 @@
-# Poisson Distribution
-
-Stats D1
-
-## Notes
-
-**Definition:** A poisson distribution is a common distribution that gives the probability of something happening at a point in time (or position or volume) where the probability of it happening at any given time is known. 
-
-An example of this is the distribution of number of texts receieved in a day where the mean is 12 texts per day. Using this information we can then use a known formula, based on normal distributions, to find the probability that we receive 8 text in a day, less than 8 texts, or any other number of texts. 
-
-Given this, a poisson distribution does not have an upper bound, but these events become exceedingly rare. As such, the graph has a right skew. 
-
-The formula is as follows:
-
-p(x) = (lambda^x * e^-lambda)/(x!)
-
-In the above we have lambda as the rate of success and x is the number of successes. 
diff --git a/PoissonProcess.md b/PoissonProcess.md
@@ -1,15 +0,0 @@
-# Poisson Process
-
-Prob L14
-
-## Notes
-
-**Definition:** A poisson process is a continous time version of the [[BernoulliProcess.md]].
-
-A poisson process models continuous time with binary outcomes. Generally, we simply track when the true case occurs.
-
-Poisson processes presuppose independence and homogenity of probability over time.
-
-## See Also
-
-[[BernoulliProcess.md]] - Memoryless, discrete time, process of binary outcomes
diff --git a/Pole.md b/Pole.md
@@ -1,15 +0,0 @@
-# Pole
-
-CG W13 L2
-
-# Notes
-
-**Definition:** A [[Vertex.md]] of [[Degree.md]] 3,5,6,7,8,... This means no isolated verticies are allowed. 
-
-Yes, that is a crap definition. Here is the real one not being taught by a moron:
-
-These are vertecies that don't connect with the correct number of other verticies. The correct numbers are 1,2, and 4.
-
-Why do they matter?
-
-Poles matter because they cause issues with subdividing surfaces. 
diff --git a/Policy.md b/Policy.md
@@ -1,9 +0,0 @@
-# Policy
-
-RL Ch 1
-
-## Notes
-
-**Definition:** A policy in machine learning is a function from the current state to the action an agent will take.
-
-Basically, this dictates what the agent will do in a given scenario, this can also include some stochasticity (necessary for exploration).
diff --git a/PoolingLayers.md b/PoolingLayers.md
@@ -1,9 +0,0 @@
-# Pooling Layers
-
-ML P762
-
-## Notes
-
-**Definition:** Pooling layers are layers of a CNN that 'pool' together surrounding values to pass through a singular representative value.
-
-This representative value is normally either max or mean, but max has grown in favor of late.
diff --git a/Postcondition.md b/Postcondition.md
@@ -1,7 +0,0 @@
-# Postcondition 
-
-U 1.4.1
-
-## Notes
-
-**Definition:** Postconditions are the expected outputs of a function or program which are predicated upon the specified [[Preconditions.md]].
diff --git a/PosteriorProbability.md b/PosteriorProbability.md
@@ -1,7 +0,0 @@
-# Posterior Probability
-
-Ch 1.6
-
-## Notes
-
-**Definition:** Posterior probabilities are probabilities after some data has been collected/found/sampled (conditioned probability).
diff --git a/PowerSet.md b/PowerSet.md
@@ -1,21 +0,0 @@
-# Power Set
-
-AM Ch1
-
-## Notes
-
-**Definition:** The power set is the set of all subesets of the input set. 
-
-Example:
-
-P(A) = {X : X $\in$ A}
-
-B = {0, 1}
-
-P(B) = {{}, {0}, {1}, {0,1}}
-
-Notice that {0,1} and {1,0} are not both included as sets are unordered and unique.
-
-#### Cardinality 
-
-The cardinality of a powerset is equal to 2^n where n is the number of elements in the original set.
diff --git a/Precision.md b/Precision.md
@@ -1,13 +0,0 @@
-# Precision of a classifier
-
-CH 3
-
-## Notes
-
-**Definition:** The precision of a classifier (classification model) is the accuracy of positive predictions.
-
-Here is the formula:
-
-precision = TP / TP+FP
-
-As can be seen, this does not take into account negatives only true positives and false positives.
diff --git a/Preconditions.md b/Preconditions.md
@@ -1,7 +0,0 @@
-# Preconditions
-
-U 1.4.1
-
-## Notes
-
-**Definition:** Preconditions are necessarily specified inputs (or variables) to a function (or program) that are required prior to execution/evaluation. 
diff --git a/Predicate.md b/Predicate.md
@@ -1,23 +0,0 @@
-# Predicate
-
-U 1.4.1
-
-## Notes
-
-**Definition:** The predicate in a mathematical context is the part of a statement that gives us a truth value when variables are at play.
-
-In the case of 'x < 2' the predicate is 'less than 2'. This can be stated as a propositional function P(x). The following are valid inputs and outputs of said function:
-
-P(1) = True
-
-P(2) = False
-
-P(3) = False
-
-Another example is 'x + y = z' denoted as R(x,y,z) where the predicate is '='.
-
-R(2, -1, 5) = False
-
-R(3, 4, 7) = True
-
-R(x, 3, z) not a proposition, but R is still a propositional function.
diff --git a/Prediction.md b/Prediction.md
@@ -1,9 +0,0 @@
-# Prediction
-
-Ch2
-
-## Notes
-
-**Definition:** Prediction is the process of predicting an output given a sample.
-
-This is different than [[Inference.md]] which is focused on understanding relationshipts between variables.
diff --git a/Preimage.md b/Preimage.md
@@ -1,19 +0,0 @@
-# Preimage
-
-Khan Unit 2
-
-## Notes
-
-**Definition:** The preimage of an image is the set of all values in the codomain such that their mappings are all in a specified image. This image may be the codomain or some other set.
-
-T^-1(S) = Preimage of S under T.
-
-This can also be stated as T'(S)
-
-## Finding
-
-To find the preimage of some image under T we need to find all input vectors a such that T(a) is in the image.
-
-If we specify the image as <1,2> and <0,0> then we need to find all <x_1,x_2> such that <x_1, x_2> x L.T. Matrix = <1,2> or <0,0>. 
-
-This final result can be found using [[ReducedRowEchelonForm.md]] of both augmented matricies created using the above information where the result is the computed values of all pivot variables put into matricies.
diff --git a/PretrainedModels.md b/PretrainedModels.md
@@ -1,13 +0,0 @@
-# Pretrained Models
-
-ML P570
-
-## Notes
-
-**Definition:** Pretrained models are ML models that have been trained in the past and can be used for doing other things.
-
-Pretrained models often use [[TransferLearning.md]] because the goal with pretrained models is to use the existing model that has already been trained to work well with a new set of data. This often involves changing the model's top layers (training new ones for the specific task) while keeping the lower layers in tact as they often do simple tasks like edge detection which are reusable.
-
-When doing this the layers that don't change are called the fixed weights while the ones that are changed are called the trainable weights.
-
-A good thing about pretrained models is that they generally require less training data to get a certain level of accuracy for predictions.
diff --git a/PrimeFactorization.md b/PrimeFactorization.md
@@ -1,13 +0,0 @@
-# Prime Factorization
-
-U 2.4
-
-## Notes
-
-**Definition:** The prime factorization of any given number is the multiplication of prime numbers that results in the number.
-
-In the case a number is prime its prime factorization would then be itself. 
-
-## Calculate
-
-To calculate the prime factorization of any given number you iterate through the prime numbers less than or equal to its square root and any time a prime number is a divisor of the specified value you add it to the list of the prime factorization. You then test to see if it is divisible by that number again and continue adding it if it is. Otherwise you move on to the next prime until reaching the number of interest.
diff --git a/PrimeNumber.md b/PrimeNumber.md
@@ -1,9 +0,0 @@
-# Prime Number
-
-U 2.4
-
-## Notes
-
-**Definition:** A prime number is a number greater than 1 such that its only divisors are itself and 1. 
-
-A common way to prove the primality of a number is to show all numbers less than its square root do not divide it (a | b == False).
diff --git a/PrincipleOfInclusionExclusion.md b/PrincipleOfInclusionExclusion.md
@@ -1,19 +0,0 @@
-# Principle of Inclusion-Exclusion
-
-Ch 8.3 Rosen
-
-## Notes
-
-**Definition:** The principle of inclusion-exclusion is a principle used to count the number of elements in the union of a finite number of sets.
-
-Consider:
-
-$|A \cup B| = \text{ } ?$
-
-We know that all elements of A and all elements of B will be in the union, but there are some elements that may be in both. These are simply elements of $A \cap B$. As such we see:
-
-$|A \cup B| = |A| + |B| - |A \cap B|$.
-
-#### Theroem
-
-I will not write this out, but it should be understood that this can be stated simply as only counting the intersection of sets one time overall.
diff --git a/PriorProbability.md b/PriorProbability.md
@@ -1,7 +0,0 @@
-# Prior Probability
-
-Ch 1.6
-
-## Notes
-
-**Definition:** Prior probabilities are probabilities prior to a conditional being applied to them (unconditioned probability).
diff --git a/Probability.md b/Probability.md
@@ -1,19 +0,0 @@
-# Probability 
-
-Stats CH1
-
-## Notes
-
-**Definition:** The probability is the likelihood of something happening as a percentage between 0 and 1 or 0% and 100%. 
-
-Let X be a set and F a set of subsets of X. A probability on (X,F) is a function u : F -> [0,1]. This means for each set in F we have a probability between 0 and 1 for each set. See [[SetFunction.md]] for more about the u (mu greek character) function.
-
-The probability function must be a set function, but that is not sufficient. We also need for u(0) where 0 is the empty set to be equal to 0. We also need u(X) = 1 (totaling 100%), and if A and B are disjoint sets then u(A union B) = u(A) + u(B). This final part means the probability of the union of two different sets is equal to the sum of the probabilities of both sets individually. 
-
-When we have a domain that is finite we then state we have a [[DiscreteProbability.md]] whereas when we have an interval then the function is said to be a [[ContinuousProbability.md]].
-
-In practical terms, for u(X) X is the set off outcomes that are possible and the function returns the probability of said outcome. 
-
-**Sometimes we use u({1}) but often we use P(1) to describe the probability of 1. Notice that u({1}) takes in a set whereas P(1) does not require such statements.**
-
-Some syntax for ya, when stating P(H|Theta = 1/3) this means the probability of H given that theta = 1/3. This is often used when we don't know the probability of theta, but need to describe the situation. In this instance, theta is considered a parameter.
diff --git a/ProbabilityDensityFunctions.md b/ProbabilityDensityFunctions.md
@@ -1,23 +0,0 @@
-# Probability Density Functions (PDFs)
-
-Stats ch1
-
-## Notes
-
-**Definition:** A probability density function shows the probability of outcomes for [[ContinuousProbability.md]] problems.
-
-**Important:** PDFs are for continuous random variables whereas PMFs are for discrete.
-
-Think of KDEs and kind of histograms. The difference with histograms is they use bins instead of a continuous probability graph.
-
-Something to note, the area under the curve is the probability. As such, the likelihood of all values that come before some value is the integral over said range (Antiderivative. See fundamental theroem of calculus).
-
-Another property of a PDF is that the integral of -infinity to infinity is always equal to 1 and p(y) >= 0 for all y. This means there is never a negative probability and there is a 100% probability across the domain of the function.
-
-## Percent Calculation
-
-The integral along a specified range is the probability of something happening in that range. This is thought of as the area below the curve for the specified range.
-
-An interesting thing about this is that any given point, given that it is uncountable, will have a p(x) = 0, but when calculating the area we find a value. 
-
-Additionally, we find that the integral of the function from -inf to inf is always equal to 1.
diff --git a/ProbabilityLaw.md b/ProbabilityLaw.md
@@ -1,13 +0,0 @@
-# Probabilty Law
-
-L1
-
-## Notes
-
-**Definition:** The probability law assigns some set A (event) a nonnegative P(A) that describes the likelihood fo the elements of A.  
-
-The probability law specifies the likelihood of the input given the sample space. The rules for it are as follows:
-
-1. P(A) >= 0
-2. P(A union B) = P(A) + P(B) if A and B are [[DisjointSet.md]].
-3. P(Omega) = 1
diff --git a/ProbabilityMassFunction.md b/ProbabilityMassFunction.md
@@ -1,52 +0,0 @@
-# Probability Mass Function (PMF)
-
-L4
-
-## Notes
-
-**Definition:** A PMF describes the probability of some mapping of a [[RandomVariable.md]] from inputs to a specific output. 
-
-**Important:** PMFs are for discrete random variables whereas PDFs are for continuous.
-
-This can be displayed as some form of bar graph.
-
-To find the PMF value for a given point we sum the probability of each input that maps to the output in question.
-
-## Example
-
-```mermaid
-
-graph LR
-a --> x
-b --> y
-c --> y
-d --> z
-```
-
-In the below example assume each connection is the function defined by the Random Variable. As such, the PMF output for x would be P(a). The PMF output for y would be P(b) + P( c ) and the output for z would be P(d).
-
-With proper notation (and assuming random variable X) we can state the above as $P_X(x) = P(A), \space P_X(y) = P(B)+P(C)$ etc.
-
-## Expected Value
-
-The expected value of a PMF is the most probable output. This is calculated by summing the probabilities of each output multiplied by the output value. This will be the 'middle' of the sample space. 
-
-This is denoted by the function E[] where the inside is the random variable that is being predicted upon. 
-
-## Geometric
-
-The geometric PMF is a specific PMF where every subsequent output decreases by a given percent each time creating a form of poisson distribution.
-
-The geometric PMF is **memoryless** in that regardless of the step you start on, the future probabilities will be the same because all conditionals result in no added information thus they are independent probabilities.
-
-## Conditional (L6)
-
-Conditional PMFs are just PMFs but they have a specified even that occurred. In these instances we simply resize the sample space accordingly and then recalculate probabilities.
-
-## Joint (L6)
-
-See [[JointProbability.md]] for joint PMF information.
-
-## Marginal (L6)
-
-The marginal PMF of X is the P_X(x) that can be found from the joint probability of P_{X,Y}(x,y). Basically, we reverse engineer the probability of a given outcome given the sum of the joint probabilities.
diff --git a/ProbingFunction.md b/ProbingFunction.md
@@ -1,9 +0,0 @@
-# Probing Function
-
-Ch 5
-
-## Notes
-
-**Definition:** A probing function is a function that takes in an ordered pair of inputs, the first which is a hashcode and the second which is the iteration and outputs a position between 0 and m-1.
-
-In the case of linear probing, the function takes in an input of the hashcode (address) and the iteration. It then returns the hashcode + iteration mod m-1, this ensures the next address is checked if the previous was occupied.
diff --git a/ProductRule.md b/ProductRule.md
@@ -1,7 +0,0 @@
-# Product Rule
-
-Leonard
-
-## Notes
-
-**Definition:** The product rule is used when taking the derivative of two functions that are multiplied together. The rule is as follows $\frac{d}{dx}(g(x)f(x)) = g'(x)f(x) + f'(x)g(x)$
diff --git a/Prognosticator.md b/Prognosticator.md
@@ -1,7 +0,0 @@
-# Prognosticator
-
-Superintelligence - Bostrom
-
-## Notes
-
-**Definition:** A prognosticator is someone who tells of the future.
diff --git a/ProgrammerVisibleState.md b/ProgrammerVisibleState.md
@@ -1,15 +0,0 @@
-# Programmer Visible State
-
-CA L3
-
-## Notes
-
-**Definition:** Programmer visible state is all state of program execution that is visible to programs. 
-
-This includes the program counter, registers, and memory.
-
-This is all information visible to the programmer.
-
-See [[ISA.md]] for more related content.
-
-There is also programmer invisible state which includes cache and pipline registers are example of state in the [[MicroArchitecture.md]]
diff --git a/Projection.md b/Projection.md
@@ -1,128 +0,0 @@
-# Projection
-
-ML D5 / Khan
-
-## Notes (mostly ml)
-
-**Definition:** Projection is the process of moving an element from higher dimensional space to a lower dimensional space. 
-
-Projection is often used to reduce dimensionallity of datasets as data becomes more sparse in higher dimensions.
-
-## Linear Algebra Specifics
-
-The Proj_L(x) = vector \in L where x - Proj_L(x) is orthogonal to L.
-
-This states the difference between the projection and the original vector is orthogonal to the line (dot produce = 0).
-
-Let's define v as some vector on L. We then know the projection of x onto L is cv for some coeficcient c. This coefficient can be found as described below.
-
-Finding:
-
-1. Take dot product between vector v on line L and our original vector x
-2. Divide result by ||v|| to find c
-3. Find the vector p which is v scaled to length of c
-
-Example:
-
-L = <2,1>
-x = <2,3>
-v = <2,1> (arbitrary on L)
-
-Find dot product:
-x dot v = 4 + 3 = 7
-
-Find length of v:
-||v|| = sqrt(5)
-
-Find length of projected vector (dot product / ||v||):
-l = 7 / sqrt(5) = 3.13
-
-Scale v to be projection:
-//it is not necessary to create a unit vector, but easier conceptually
-vhat = <2,1> x 1/sqrt(5)
-vhat = <.894, .447>
-
-p = vhat x l
-= <.894,.447> x 3.13
-= <2.798, 1.399>
-
-Code:
-
-```python3
-
-# Define vector class with init, length, standardize, multiply by scalar, and print
-import math
-
-class Vector3d:
-    def __init__(self, x, y, z):
-        self.x = x
-        self.y = y
-        self.z = z
-    
-    def length(self):
-        return math.sqrt(self.x * self.x + self.y * self.y + self.z * self.z)
-    
-    def standardized(self):
-        len = self.length()
-        scaleFactor = 1/len
-        return Vector3d(self.x * scaleFactor, self.y * scaleFactor, self.z * scaleFactor)
-    
-    def __mul__(self, scalar):
-        return Vector3d(self.x * scalar, self.y*scalar, self.z * scalar)
-    
-    def __repr__(self):
-        return('x: ' + str(self.x) + '\ny: ' + str(self.y) + '\nz: ' + str(self.z))
-
-# Initialize vector to project 
-v = Vector3d(2,3,1)
-
-# Initialize vector to project onto
-u = Vector3d(3,10,2)
-
-# Find DP
-dp = v.x * u.x + v.y * u.y + v.z * u.z
-
-# Find length of projected onto
-uLen = u.length()
-
-# Find projected vector's length
-projLen = dp / uLen
-
-# Scale u to length projLen
-standardU = u.standardized()
-projected = standardU * projLen
-
-print(projected)
-
-# Output:
-
-# x: 1.008849557522124
-# y: 3.362831858407079
-# z: 0.6725663716814159
-
-```
-
-## Matrix Representation
-
-A projection is a L.T. and thus it can be expressed by a matrix.
-
-Additionally, given our knowledge about L.Ts we know that Proj(a + b) = Proj(a) + Proj(b) and that Proj(ca) = cProj(a).
-
-To find the matrix we use the same identity matrix strategy where we find the columns based on their mapping from the identity matrix to the output of the L.T.
-
-Example:
-
-// Projection onto
-u = <2,1>
-
-// Identity matrix
-[1 0]
-[0 1]
-
-// This can be calculated using the above strategy, but I used my code to find this
-Proj_u(<1,0>) = <.8,.4>
-Proj_u(<0,1>) = <.4,.2>
-
-// Standard Matrix of L.T. based on identity matrix column inputs
-[.8 .4]
-[.4 .2]
diff --git a/Proposition.md b/Proposition.md
@@ -1,7 +0,0 @@
-# Proposition 
-
-Discrete 1.1
-
-## Notes
-
-**Definition:** A proposition is a statement that is either true or false.
diff --git a/PropositionalFunction.md b/PropositionalFunction.md
@@ -1,23 +0,0 @@
-# Propositional Function
-
-U 1.4.1
-
-## Notes
-
-**Definition:** A propositional function is a function that takes an arbitrary number of inputs and outputs a truth value.
-
-An example of a propositional function is the function P(x) defined as 'x > 2'. This function could then be evaluated as follows:
-
-1. P(1) | False 
-2. P(2) | False
-3. P(3) | True
-
-Once a propositional function is evaluated with some object(s) and no longer contains any variables, it is then said to be a proposition.
-
-Given this, we know that the propositional function P(x,y) is still a propositional function when we specify P(1,y) because there is still a variable namely y.
-
-An interesting thing about propositional functions with the universal quantifier is that if the universe (U) is empty, the proposition is true as there are no counterexamples. 
-
-## Bound and Free
-
-When we assign a variable of a propositional function it is said to be bound. Conversely, when a variable is not bound it is then free.
diff --git a/ProveSetEquality.md b/ProveSetEquality.md
@@ -1,7 +0,0 @@
-# Prove Set Equality
-
-AM TB Ch8
-
-## Notes
-
-To prove that two sets are equivalent (A and B), we first prove that A contains B. We then show that B also contains A thus all elements must be the same making the sets equivalent. Equivalence of sets is done using the = sign not the $\equiv$ sign.
diff --git a/PseudoGraphs.md b/PseudoGraphs.md
@@ -1,7 +0,0 @@
-# Pseudo Graph
-
-Ch 10.1
-
-## Notes
-
-**Definition:** A pseudo graph is a graph that allows multi edges and loops, but is directed.
diff --git a/QuadraticProbing.md b/QuadraticProbing.md
@@ -1,31 +0,0 @@
-# Quadratic Probing
-
-Ch 5
-
-## Notes
-
-**Definition:** Quadratic probing is a probing strategy where we start with the input and then alternately move right and left by successive perfect squares. 
-
-Basically, we check first e then e + 1 then then e - 1 then e + 4 then e - 4 and so on.
-
-Note:
-
-This only works with specific array sizes because some sizes don't get entirely mapped to by the probing function for all valid values of e.
-
-Implementation in code:
-
-```c#
-
-for(int i = 0 ; i < 10 ; ++i){
-	int n = (i + 1) / 2;
-	if(i % 2 == 0){
-		Console.WriteLine(n * -n);
-	}
-	else{
-		Console.WriteLine(n * n);
-	}
-}
-
-```
-
-
diff --git a/Quantifiers.md b/Quantifiers.md
@@ -1,40 +0,0 @@
-# Quantifiers
-
-U 1.4.2
-
-## Notes
-
-**Definition:** Quantifiers are operators that describe the number of individuals in a domain that satisfy something.
-
-The two common quantifiers are:
-
-1. $\exists$ - There exists (existential quantifier)
-2. $\forall$ - For all (universal quantifier)
-
-Another derived one is $\exists !$ which means there exists only one. Another way to state this is $\exists_1$. By using this notation we can then specify there are only an arbitrary number of elements of the set that have some property.
-
-### Propositional Functions
-
-Quantifiers can be used to turn a propositional function into a proposition much like numbers. 
-
-$\forall x P(x)$ - We would need to specify some universe, but this is a proposition and not a propositional function
-
-$P(x)$ - This is a propositional function
-
-$P(1)$ - This is a proposition
-
-### Negation
-
-When negating a for each we negate the propositional function (not P(x)) and flip the for each to be there exists.
-
-The opposite is also true for negating a there exists.
-
-Examples:
-
-For all - $\neg \forall x P(x) = \exists x \neg P(x)$
-
-There exists - $\neg \exists P(x) = \forall x \neg P(x)$
-
-### Scope
-
-The scope of a quantifier is limited to what is contained in the parenthesis or preceeding propositional function when of the form $\forall x P(x)$. As such, $\forall x P(x)  \to C(x)$ is not defined as we would need to specify $\forall x (P(x) \to C(x))$. This avoids an ambiguity in our statements.
diff --git a/Quantile.md b/Quantile.md
@@ -1,11 +0,0 @@
-# Quantile
-
-Stats D3
-
-## Notes
-
-**Definition:** Quantiles are logic divisions in a dataset to classify certain information.
-
-Examples are medians which split the data into two subsets, quartiles which split it into 4 quantiles, quintiles (5), deciles (10), and percentiles (100). 
-
-Another thing to pay attention to is the interquartile/interquantile range which is the range from the top to the bottom of the quartile/quantile.
diff --git a/Quaternions.md b/Quaternions.md
@@ -1,11 +0,0 @@
-# Quaternions
-
-CS 331 W11 L2
-
-## Notes
-
-**Definition:** These are four values that describe something which can be stated as (x,y,z,w). In Unity, quaternions are used to describe rotations about axis.  
-
-There are names for the rotations with regard to the local coordinate system. The lean forward and backward is the pitch (rotation about x axis), the rotation around their center is the yaw (rotation about y axis think spinning in circles), and the rotation about the z axis is called the roll (think barrel rolls).  
-
-See [[Transform.md]] for more information about coordinate systems and such.
diff --git a/Queue.md b/Queue.md
@@ -1,15 +0,0 @@
-# Queue
-
-CS202 L14 / CS303 Ch 1
-
-## Notes
-
-**Definition:** This is a datatype that works on a first in first out basis. This is often implemented using a [[SinglyLinkedList.md]] with a link to the tail (where more nodes would be added). This is also often implemented such that you add to the end and remove from the start. 
-
-enqueue: add to queue
-
-dequeue: remove from queue
-
-peek: view the front element
-
-It is important to note, generally, people implement these to add to the back and remove from the start although either direction is functionally equivalent. See [[Stack.md]] for information about a lifo approach.
diff --git a/RCombination.md b/RCombination.md
@@ -1,11 +0,0 @@
-# r-Combination 
-
-Ch 6.3
-
-## Notes
-
-**Definition:** An r-Combination is a combination of length r.
-
-The function to denote r-combinations of a set length n is C(r,n). There are other ways to state it, but I prefer this. 
-
-$C(r,n) = \frac{n!}{r!(n-r)!}$
diff --git a/RMSE.md b/RMSE.md
@@ -1,33 +0,0 @@
-# Root Mean Square Error
-
-ML CH2
-
-## Notes
-
-**Definition:** This is the most common form of error measuring for regression problems where you take the difference between each inference and the actual output, square it, do this with all samples, divide by the number of samples, and then take the square root. 
-
-This is common because it weights more heavily far off inferences than slighly off inferences.
-
-Here is a simple implementation:
-
-```python
-import math
-
-# often you would use ordered pairs for expected and inference.
-expected = [10, 10, 4, 3, 2, 4, 5, 5]
-inference = [9 , 7, 3, 2, 1, 3, 2, 5]
-
-count = 0
-total = 0
-while count < len(expected):
-    exp = expected[count]
-    inf = inference[count]
-    total += (exp - inf) ** 2
-    count += 1
-
-total = total / count
-total = math.sqrt(total)
-print(total)
-```
-
-Another metric for errors is [[MAE.md]].
diff --git a/ROC.md b/ROC.md
@@ -1,9 +0,0 @@
-# Receiver Operating Characteristic (ROC)
-
-ML D3
-
-## Notes
-
-**Definition:** The ROC curve plots the rate of true positives for a dataset against the rate of false positives as the decision threshold changes.
-
-This type of graph is used to show threshold information for binary classification models.
diff --git a/RPermutation.md b/RPermutation.md
@@ -1,13 +0,0 @@
-# r-Permutations
-
-TB 6.3
-
-## Notes
-
-**Definition:** r-Permutations are permutations that have a lenght of r.
-
-An important functions is P(n, r) where this denotes the number of r-permutations of a set with a length of n.
-
-The formula for P(n,r) is as follows:
-
-$P(n,r) = \frac{n!}{(n-r)!}$.
diff --git a/RadialBasisFunction.md b/RadialBasisFunction.md
@@ -1,11 +0,0 @@
-# Radial Basis Function (RBF)
-
-ML CH2
-
-## Notes
-
-**Definition:** A radial basis function is a function whose values depend only on the distance between the input and some fixed point. 
-
-The kernel can be represented by the equation such that the position is equal to the distance between the input and center point, squared, divided by 2L^2, and then taking the negative natural exponential (e raised to the input). In this example, L is some hyperparameter that affects the rate at which the graph will go from 1 to near 0. 
-
-Additionally, the lowest value for these functions is generally 0 while the highest is 1 (meaning the exact position). 
diff --git a/RamseyNumbers.md b/RamseyNumbers.md
@@ -1,7 +0,0 @@
-# Ramsey Number
-
-Ch 6.2
-
-## Notes
-
-**Definition:** A Ramsey number R(m,n) where m,n are natural numbers and n is greater than or equal to 2, is the minimum number of people at a party such that there are either m mutual friends or n mutual enemies.
diff --git a/RandomExperiment.md b/RandomExperiment.md
@@ -1,9 +0,0 @@
-# Random Experiment
-
-Ch 1.1
-
-## Notes
-
-**Definition:** A random experiment is a specified set of procedures that result in a truly random outcome (not necessarily uniformly) in the sample space.
-
-This is different than a [[RandomVariables.md]] in the sense that a random variable maps the outcomes of a given experiment to another value whereas this outputs the outcome.
diff --git a/RandomForest.md b/RandomForest.md
@@ -1,11 +0,0 @@
-# Random Forest
-
-ML D4
-
-## Notes
-
-**Definition:** A random forest is an [[Ensembles.md]] of [[DecisionTrees.md]] used to make predictions based on majority voting or some other cost function.
-
-This uses a wisdom of the crowd philosophy where most likely the aggregated sum of many answers is better than one expert answer.
-
-Random forests are normally trained with [[Bagging.md]] and sometimes with [[Pasting.md]]. 
diff --git a/RandomPatches.md b/RandomPatches.md
@@ -1,11 +0,0 @@
-# Random Patches (Method)
-
-ML D5
-
-## Notes
-
-**Definition:** The random patches method for random sampling uses bagging (sometimes pasting) as well as selecting a random subset of features.
-
-This ensures both a random subset of samples and a random set of features. This reduces variance but increases bias.
-
-This is useful for high-dimensional inputs like images that take a long time to train models.
diff --git a/RandomProjection.md b/RandomProjection.md
@@ -1,31 +0,0 @@
-# Random Projection
-
-## Notes
-
-**Definition:** Random projection is an algorithm that selects dimensions at random to project onto. 
-
-Random projection is used because PCA can often be slow, and it has been shown that random projection does not loose too much data.
-
-This is for when you have things like 20,000 dimensions.
-
-There is also the johnson lindenstrauss min dim function from sklearn random projection that calculates based on the number of samples and some value reprensting the acceptable loss amount, the minimum number of dimensions to show all of the information with at least a certain level of accuracy.
-
-Example:
-```python3
-from sklearn.random_projection import johnson_lindenstrauss_min_dim
-m, ε = 5_000, 0.1
-d = johnson_lindenstrauss_min_dim(m, eps=ε)
-d
-```
-
-The output of this is 7300 so any higher dimensional values can be randomly projected to 7300 dimensional space without losing more than approximately 10% accuracy.
-
-Below is an example implementation of this random projection where we simply pass in the acceptable loss amount:
-
-```python3
-
-sklearn.random_projection import GaussianRandomProjection
-gaussian_rnd_proj = GaussianRandomProjection(eps=ε, random_state=42)
-X_reduced = gaussian_rnd_proj.fit_transform(X) # same result as above
-
-```
diff --git a/RandomSubspaces.md b/RandomSubspaces.md
@@ -1,7 +0,0 @@
-# Random Subspaces Method
-
-ML D5
-
-## Notes
-
-**Definition:** The random subspaces method is similar to [[RandomPatches.md]] except it keeps all training instances and only samples features.
diff --git a/RandomVariables.md b/RandomVariables.md
@@ -1,24 +0,0 @@
-# Random Variables
-
-L4 + Khan
-
-## Notes
-
-**Definition:** Random variables in stats and probability are functions that map processes to outcomes that depend on random events.
-
-Random variables, despite the name, are functions not variables.
-
-A random variable is any function that depends on randomness. In this way, a mapping from a value to a real number and then multiplying it by a scalar can be composed of two random variables. The first one maps to the real number takes a random input and outputs a value x times more than the input.
-
-## Formal
-
-A function from omega (sample space) to the real numbers (discrete or continuous does not matter).
-
-## Example
-
-Example:
-
-X = {1 if heads}
-	{0 if tails}
-
-Geometric random variables are random variables that result in [[ProbabilityMassFunction.md]] with a geometric shape (see PMF for more).
diff --git a/Range.md b/Range.md
@@ -1,11 +0,0 @@
-# Range
-
-Khan
-
-## Notes
-
-**Definition:** The range of a function is the set of all possible outputs of the function given the domain of the function.
-
-Formally we can state it as the following where D is the domain of the function and R is the range of the input function:
-
-$R(f) = \{y \space | \space \exists x \in D \text{ such that } f(x)=y\}$
diff --git a/Rank.md b/Rank.md
@@ -1,11 +0,0 @@
-# Rank
-
-Khan
-
-## Notes
-
-**Definition:** Rank, similar to [[Nullity.md]], is a way to describe the dimensionallity of the vector space generated by the columns of a matrix.
-
-[[Nullity.md]] is the same thing except specifically referring to a matrix's null space.
-
-We call the rank 'full rank' when the rank is equivalent to the [[AmbientSpace.md]].
diff --git a/RealVectorSpace.md b/RealVectorSpace.md
@@ -1,15 +0,0 @@
-# Real Vector Space
-
-**Source:** Linear Algebra Done Right
-
-**Chapter:** 1
-
-## Notes
-
-**Definition:** A real vector space is a [VectorSpace](VectorSpace.md) on $R$ where $R$ is the set of real numbers.
-
-## Importance
-
-The importance of this distinction lies in the fact that vector spaces are defined on a set $F$ which is often the set of complex numbers or real numbers, but if we define a set {2,3}, we notice normal vector spaces cease to be so because there is no longer a multiplicative identity.
-
-In line with this, we also define a complex vector space as a vector space over the set of complex numbers.
diff --git a/RecencyHeuristic.md b/RecencyHeuristic.md
@@ -1,9 +0,0 @@
-# Recency Heuristic
-
-L4
-
-## Notes
-
-**Definition:** The recency heuristic is a solution to the credit assignment problem where we assign credit to reward/punishment to the most recent state(s).
-
-This is opposed to the [FrequencyHeuristic](FrequencyHeuristic.md) where we assign credit to the things that happened most often leading to the reward signal.
diff --git a/RecurrenceRelation.md b/RecurrenceRelation.md
@@ -1,13 +0,0 @@
-# Recurrence Relation
-
-U2.4.2
-
-## Notes
-
-**Definition:** A recurrence relation is an equation that expresses some a_n in terms of one or more prior terms from the sequence. As such, we must specify initial conditinos such that the sequence can be calculated (think basecase).
-
-Note: The relation is an equation but the result and necessary information to find the next value is a sequence.
-
-Ex:
-
-$a_n = a_{n-1} + 2a_{n-2}$ for $n \geq 2$ where (basecase) $a_0 =2$ and $a_1 =5$. 
diff --git a/ReducedRowEchelonForm.md b/ReducedRowEchelonForm.md
@@ -1,10 +0,0 @@
-# Reduced Row Echelon Form
-
-Khan
-
-## Notes
-
-**Definition:** Reduced row echelon form is a form of matrix where each row has a 1 after the zeoes that are all on the left side of the one. Additionally, each row above another row must have its 1 further to the left than the prior one, and all values to the right of the one should be zeroes if possible.
-
-
-If we are trying to find the basis of the column space then the columns with the pivot variables are that information. Alternatively, we can do RREF, write the equations, solve for the pivots based on the free variables. These will both give the correct result, but they will be different results as there are many statements of the basis of a span.
diff --git a/Reflexive.md b/Reflexive.md
@@ -1,7 +0,0 @@
-# Reflexive
-
-Ch 9.1
-
-## Notes
-
-**Definition:** A reflexive relation is a relation that is always true for an ordered pair where both elements are the same.
diff --git a/ReflexiveClosure.md b/ReflexiveClosure.md
@@ -1,9 +0,0 @@
-# Reflexive Closure
-
-Ch 9.4
-
-## Notes
-
-**Definition:** A reflexive closure is a closure of a relation with respect to some property such that xRx for all x=x.
-
-When shown as a zero one matrix, this will manifest as the main diagonal being all 1's.
diff --git a/RegressionProblem.md b/RegressionProblem.md
@@ -1,19 +0,0 @@
-# Regression Problem
-
-ML L1
-
-## Notes
-
-**Definition:** A regression problem is a problem where the value trying to be predicted is continuous (think graphing not yes/no).
-
-Yes/no problem is a [[ClassificationProblem.md]].
-
-Also see for a more specific example [[LinearRegression.md]]. There are other types of regression as well such as polynomial regression (no note at this time).
-
-When discussing regression, we often use the term "target" instead of "label" to describe the desired output. This contrasts with classification problems where we use the term label.
-
-See also [[LogisticRegression.md]] where we assign a probability of group membership.
-
-With regression, we describe the performance measure as the utility function or fitness function. This measures how good the model is. The inverse of this is the cost function which measures how bad it is.
-
-A uninvariate regression problem is one where you are trying to predict a single value as the output. The opposite of this is a multivariate regression problem where you are trying to detemine multiple output values.
diff --git a/RegressionToTheMean.md b/RegressionToTheMean.md
@@ -1,7 +0,0 @@
-# Regression to the Mean
-
-L19
-
-## Notes
-
-**Definition:** Regression to the mean is the idea that if an unlikely event occurs it is likely the next sampling will be closer to the mean of the distribution.
diff --git a/RegularExpressions.md b/RegularExpressions.md
@@ -1,9 +0,0 @@
-# Regular Expressions
-
-**Source:**
-
-**Chapter:**
-
-## Notes
-
-**Definition:**
diff --git a/RegularLanguages.md b/RegularLanguages.md
@@ -1,9 +0,0 @@
-# Regular Languages 
-
-**Source:** Theory of Computation
-
-**Lecture:** 1
-
-## Notes
-
-**Definition:** A language is a regular language if there exists a finite automaton that recognizes it (ie. the FA's language is the language in question).
diff --git a/ReinforcementLearning.md b/ReinforcementLearning.md
@@ -1,58 +0,0 @@
-# Reinforcement Learning
-
-Reinforcement Learning: An Introduction (Sutton & Barto)
-
-Chapter 1 (Introduction)
-* [MarkovDecisionProcesses](MarkovDecisionProcesses.md)
-* [Exploit](Exploit.md)
-* [Explore](Explore.md)
-* [Policy](Policy.md)
-* [RewardSignal](RewardSignal.md)
-* [ValueFunction](ValueFunction.md)
-* [Model](Model.md)
-* [EvolutionaryMethods](EvolutionaryMethods.md)
-
-DeepMind UCL Lectures
-
-L1
-* [CreditAssignmentProblem](CreditAssignmentProblem.md)
-* [ImitationLearning](ImitationLearning.md) 
-* [MarkovAssumption](MarkovAssumption.md)
-* [PartiallyObservableMarkovDecisionProcess](PartiallyObservableMarkovDecisionProcess.md)
-* [ModelFree](ModelFree.md)
-* [Bandits](Bandits.md)
-* [Evaluation](Evaluation.md)
-
-L2
-* [MarkovDecisionProcesses](MarkovDecisionProcesses.md)
-* [MarkovAssumption](MarkovAssumption.md)
-* [DiscountFactor](DiscountFactor.md)
-* [MarkovRewardProcess](MarkovRewardProcess.md)
-* [MarkovProcess](MarkovProcess.md)
-* [Return](Return.md)
-* [Policy](Policy.md)
-* [BellmanEquation](BellmanEquation.md)
-
-L3
-* [DynamicProgramming](DynamicProgramming.md)
-* [OptimalSubstructure](OptimalSubstructure.md)
-* [OverlappingSubproblems](OverlappingSubproblems.md)
-
-L4
-* [ModelFree](ModelFree.md)
-* [Episodic](Episodic.md)
-* [Episode](Episode.md)
-* [MonteCarloLearning](MonteCarloLearning.md)
-* [IncrementalMean](IncrementalMean.md)
-* [TemporalDifferenceLearning](TemporalDifferenceLearning.md)
-* [FrequencyHeuristic](FrequencyHeuristic.md)
-* [RecencyHeuristic](RecencyHeuristic.md)
-* [EligibilityTraces](EligibilityTraces.md)
-
-L5
-* [ModelFree](ModelFree.md)
-* [MonteCarloLearning](MonteCarloLearning.md)
-* [TemporalDifferenceLearning](TemporalDifferenceLearning.md)
-* [OnPolicyLearning](OnPolicyLearning.md)
-* [OffPolicyLearning](OffPolicyLearning.md)
-* EpsilonGreedy
diff --git a/Relation.md b/Relation.md
@@ -1,13 +0,0 @@
-# Relation
-
-CH 9.1
-
-## Notes
-
-**Definition:** A relation, in math, is a way to describe a connection between elements in the codomain and domain.
-
-Ex:
-
-A -> B
-
-(a,b) in R if a < b.
diff --git a/RelationOnASet.md b/RelationOnASet.md
@@ -1,11 +0,0 @@
-# Relation on a Set
-
-Ch 9.1
-
-## Notes
-
-**Definition:** A relation on a set is a relation where the domain and the codomain are the same set.
-
-Ex:
-
-Defin the relation R as the relation from A -> A for (a,b) such that a < b.
diff --git a/RelativeFrequency.md b/RelativeFrequency.md
@@ -1,9 +0,0 @@
-# Relative Frequency
-
-Ch 1.1
-
-## Notes
-
-**Definition:** Relative frequency is the value f/n where f is the [[Frequency.md]] of an event under a [[RandomExperiment.md]].
-
-Note this is not the same as [[Probability.md]] because probability is the true likelihood whereas relative frequency has been the historical observed likelihood based on the experiment. This value does however tend towards the probability. See [[LawOfLargeNumbers.md]].
diff --git a/RelativelyPrime.md b/RelativelyPrime.md
@@ -1,7 +0,0 @@
-# Relatively Prime
-
-U 2.4
-
-## Notes
-
-**Definition:** A relatively prime numbers (only 2 numbers) are prime numbers such that gcd(a,b) = 1.
diff --git a/RepresentationLearning.md b/RepresentationLearning.md
@@ -1,11 +0,0 @@
-# Representation Learning
-
-ML P722
-
-## Notes
-
-**Definition:** Representation learning is the iterative process of learning a representation of some value.
-
-Think of embedding where values are represented in higher dimensional space and iteratively moved to positions relative to other values.
-
-Additionally, autoencoders are also this as they learn lower dimensional representations (compressions) of higher dimension data.
diff --git a/Representative.md b/Representative.md
@@ -1,9 +0,0 @@
-# Representative
-
-Ch 9.5
-
-## Notes
-
-**Definition:** A representative is any element of an equivalence class chosen to describe the class.
-
-Often we use the least positive residual for this (think in the case of mod equivalence classes).
diff --git a/Return.md b/Return.md
@@ -1,7 +0,0 @@
-# Return
-
-L2
-
-## Notes
-
-**Definition:** Return is the sum of future rewards taking into account discount factor.
diff --git a/RewardSignal.md b/RewardSignal.md
@@ -1,9 +0,0 @@
-# Reward Signal
-
-RL Ch 1
-
-## Notes
-
-**Definition:** The reward signal is a one time signal sent to an agent telling them that the something right now is good.
-
-In this context right now may imply the current state is good or the next state will be good based on the action currently chosen.
diff --git a/RidgeRegression.md b/RidgeRegression.md
@@ -1,9 +0,0 @@
-# Ridge Regression
-
-ML D3
-
-## Notes
-
-**Definition:** Ridge regression uses a different cost function than standard linear regression to limit the size of coefficients.
-
-There is a regularization portion to the cost function which increases loss when coefficients are large thus incentivizing smaller coefficient values. Along with this, there is a hyperparameter, lambda, that gives more or less weight to this portion of the equation so a value of 0 would be standard linear regression while a high number would move the coeficcients closer and closer to 0.
diff --git a/RightHandRule.md b/RightHandRule.md
@@ -1,11 +0,0 @@
-# Right Hand Rule
-
-3B1B
-
-## Notes
-
-**Definition:** The right hand rule describes the relation between the axis components in R^3.
-
-When the right hand rule is true we have i being the index figer, j being the middle, and k being the thumb. These correspond with <1,0,0>, <0,1,0>, and <0,0,1> respectively.
-
-When the right hand rule is untrue under a transformation we know the determinant is negative as space has inverted.
diff --git a/Rotate.md b/Rotate.md
@@ -1,11 +0,0 @@
-# Rotate
-
-CS331 W12 L2
-
-## Notes
-
-Rotate is a function of the Transform class that allows rotation relative to the local rotation.
-
-See [[Translate.md]] for a similar function but for position. 
-
-Also see [[LocalScale.md]] for changing the local scale.
diff --git a/Rotation.md b/Rotation.md
@@ -1,17 +0,0 @@
-# Rotation
-
-Khan U2
-
-## Notes
-
-**Definition:** A rotation is a linear transformation (assuming the rotation axis passes through the zero vector) that rotates about some axis theta degrees **counter clockwise**.
-
-## Create Matrix
-
-To create a matrix to represent a rotation do the following:
-
-1. Start with identity matrix
-2. Calculate each individual basis vector under the rotation we want (use trig)
-3. Aggregate the results into a final matrix where each column is the result of the basis vector transformation 
-
-This is the same way we normally create the [[StandardMatrix.md]] of a L.T. for other transformations.
diff --git a/RowBuffer.md b/RowBuffer.md
@@ -1,9 +0,0 @@
-# Row Buffer
-
-## Notes
-
-**Definition:** The row buffer is the buffer used to cache a row that is from [[DRAM.md]]. This is used because it is 2-3 times more efficient to query a buffered memory address than it is to query for a new row in memory. This is handled by the DRAM memory controller. 
-
-Precharging is where the memory controller replaces the current buffered row with a new one that was requested this is done by sending highvoltage to the new and low voltage to the old. When these conflicts occur, this is 2-3 times slower than if the row was already cached.
-
-An example of a row size (and buffer size) is 8kb. This is just an example to illustrate a point. 
diff --git a/RowEchelonForm.md b/RowEchelonForm.md
@@ -1,9 +0,0 @@
-# Row Echelon Form
-
-Ch 2.2
-
-## Notes
-
-**Definition:** Row echelon form is a form such that all rows have more than or the same number of 0's starting from the left side as the row above them.
-
-In row echelon form there is no reduction of basic variables thus they don't need to be 1 like with RREF.
diff --git a/RuleLearning.md b/RuleLearning.md
@@ -1,10 +0,0 @@
-# Rule Learning
-
-ML CH1
-
-
-## Notes
-
-**Definition:** Rule learning is the process of taking in lots of data and finding associations between data. 
-
-This information can be useful when trying to implement [[DimensionalityReduction.md]].
diff --git a/RuleOfSarrus.md b/RuleOfSarrus.md
@@ -1,30 +0,0 @@
-# Rule of Sarrus
-
-Khan U2
-
-## Notes
-
-**Definition:** The rule of Sarrus is a shortcut for finding the determinant of a 3x3 matrix.
-
-Formula:
-
-     [a b c]
-Det ([d e f]) = aei + bfg + cdh - afh - bdi - ceg
-     [g h i]
-
-When looking at the matrix we add the multiplied diagonals (starting from top row 3 values) to the right and subtract the diagonals to the left (multiplying each value in the diagonal).
-
-See [[Determinant.md]] for calculating determinants, what they represent, and how to find 2x2 with a formula.
-
-Ex:
-
-    [1 2 4]
-A = [2 1 3]
-    [3 4 8]
-
-
-Det(A) = 1x1x8 + 2x3x3 + 4x2x4 - 1x3x4 - 2x2x8 - 4x1x3
-
-= 8 + 18 + 32 - 12 - 32 - 12
-
-= 2
diff --git a/Rvalue.md b/Rvalue.md
@@ -1,16 +0,0 @@
-# rvalue
-
-cs202 W14 L16
-
-## Notes
-
-**Definition:** An rvalue is a temporary value that can be moved. 
-
-These values can't be on the left side of an assignment think 
-
-```cpp
-
-int y = 4 + 7; // 4 + 7 is an rvalue
-int z = x + y; // x + y is an rvalue
-
-```
diff --git a/SMART.md b/SMART.md
@@ -1,20 +0,0 @@
-# SMART Goals
-
-W2 H&W
-
-## Notes
-
-**Definition:** This is a type of goal setting that meets the following criteria:
-
-1. Specific
-	- Make the goal specific enough to measure success and set a timeframe
-2. Measurable
-	- Ensure the goal has a metric that determines success (yes/no or regression/classification)
-3. Actionable
-	- How will the goal be achieved (do we have ability to achieve the goal?)
-4. Relevant
-	- Make sure the goal is relevant to our life
-5. Time Bound
-	- Include date when something should be done by.
-
-I should actually do this shit. 
diff --git a/SMOTE.md b/SMOTE.md
@@ -1,9 +0,0 @@
-# Synthetic Minority Oversampling Technique (SMOTE)
-
-ML P775
-
-## Notes
-
-**Definition:** SMOTE is the process of manipulating minority samples in the dataset to increase their representation and improve a model's classification of them.
-
-If you only have a few images of a specific type of flower maybe you augment them (rotate or something) to get 100s of instances so the model is trained on more instances of it and will subsequently be better at classifying them.
diff --git a/SRAM.md b/SRAM.md
diff --git a/SVM.md b/SVM.md
@@ -1,25 +0,0 @@
-# Support Vector Machines (SVMs)
-
-ML D3
-
-## Notes
-
-**Definition:** Support vector machines are models that create lines to separate different outputs by drawing lines between them leaving as much space possible between the different classes. They also have edges to the "street" where there is a line up the middle and these edges are only affected by instances located on the edge of the street and not by instances far off. These are the support vectors.
-
-### Classification
-
-Think of trying to make a street as wide as possible where there are buildings on the side that can't be moved. If the buildings move in the edges of the street need to as well. We would also see that the center line for the street moves accordingly as there is width lost on one side. Regardless of how many buildings are made far away, they do not affect the optimal width of the road. This describes how hard margin classification works, and the issue that arises with it is that if two samples are of different classes but in any way intermingle, the algorithm won't work.
-
-As such, there is also soft margin classification for svms which tries to limit margin violations while also balancing this with making the street as large as possible. With scikit learn, if you reduce the C value (hyperparameter) then it will have more margin violations. This decreases the likelihood of overfitting but reducing it too much will cause underfitting.
-
-Support Vector Machines are good for small datasets, but they do not scale well. They are also subject to feature scaling.
-
-When dealing with non-linearly classifiable datasets we can use the same polynomial strategy used with linear regression to plot based on any degree polynomial. 
-
-A trick related to SVMs is called the polynomial kernel (kernel trick). This allows for polynomial mapping without the need for a combinatorial explosion of features by doing higher dimensional mapping without having to compute everything (unclear about this).
-
-### Regression
-
-When trying to use SVMs for regression we try to fit as many samples on the street while still limiting margin violations. The width of the street is controlled by the hyperparameter epsilon.
-
-
diff --git a/SampleSpace.md b/SampleSpace.md
@@ -1,12 +0,0 @@
-# Sample Space
-
-L1
-
-## Notes
-
-**Definition:** The sample space is the space of all possible outcomes of a random experiment.
-
-This should be two things
-
-1. Mutually Exclusive (No two results can happen during the same run)
-2. Collectively Exhaustive (No result can occur outside the domain of outcomes)
diff --git a/Satisfiable.md b/Satisfiable.md
@@ -1,9 +0,0 @@
-# Satisfiable
-
-1.3.5
-
-## Notes
-
-**Definition:** A proposition is satisfiable if there is some assignment of truth values to its variables such that the outcome is true.
-
-We refer to this set of true variables as a 'solution'.
diff --git a/Scheduling.md b/Scheduling.md
@@ -1,5 +0,0 @@
-# Scheduling
-
-## Notes
-
-CPU Scheduling is done on the OS level and is generally simply about the clocks given. This can cause issues with [[DRAM.md]] because the DRAM controller prioritizes requests associated with buffered rows of memory meaning that even if two processes have the same priority they will not necessarily get the same access to memory because of optimizations done in the DRAM controller. 
diff --git a/Script.md b/Script.md
@@ -1,8 +0,0 @@
-# Script
-
-CS 331 W12 L3
-
-## Notes
-
-**Definition:** Scripts are where custom code can be added to accompany gameobjects they are associated with. 
-
diff --git a/Seam.md b/Seam.md
diff --git a/Segmentation.md b/Segmentation.md
@@ -1,13 +0,0 @@
-# Segmentation
-
-ML D5
-
-## Notes
-
-**Definition:** Segmentation in machine learning is the process of breaking up a large group into smaller ones.
-
-Image segmentation is partitioning an image into multiple segments. There are a few different types:
-
-1. Color Segmentation - Broken up by color similarities
-2. Semantic Segmentation - All pixels that are part of the same object are assigned to a segment (one segment for all people).
-3. Instance Segmentation - All pixels that are part of the same individual object (one segment per person).
diff --git a/SelfSupervisedLearning.md b/SelfSupervisedLearning.md
@@ -1,11 +0,0 @@
-# Self-supervised learning (SSL)
-
-ML CH1
-
-## Notes
-
-**Definition:** Self-supervised learning is the process of chaning input data and the model predicting the output where the output is known to it. 
-
-This is similar to [[SemiSupervisedLearning.md]] where models are trained to detect certain information (clustering) without knowing what the information means.  
-
-Basically, the model learns to train itself. By messing with inputs to get expected outputs.
diff --git a/SemiSupervisedLearning.md b/SemiSupervisedLearning.md
@@ -1,9 +0,0 @@
-# Semi-Supervised Learning
-
-ML CH1
-
-## Notes
-
-**Definition:** This is training a model with some labeled and some unlabeled data. 
-
-A good example of this is google's image classification. It will tell you person 2 is in photos 8, 10, and 15 you then given one label and it is able to cluster all of this data by the label you have given.
diff --git a/SentinelValue.md b/SentinelValue.md
@@ -1,22 +0,0 @@
-# Sentinel Value
-
-CS202 (personal learning)
-
-## Notes
-
-**Definition:** A sentinel value is a constant value used to end an execution loop. 
-
-This is also referred to as a flag value, trip value, rogue value, signal value, or dummy data.
-
-This is how you describe -1 in the context of a bfs algorithm where -1 denotes a visited location. When doing this, we know -1 is an in-band piece of data (valid based on type), but distinct from legal data values (ie. positives if we are using a non-negative weighted graph as an example). 
-
-Another example where we use -1 as a sentinel value is as follows:
-
-```python3
-
-def find(arr, val):	
-	for i in arr:
-		if i == val:
-			return i
-	return -1
-```
diff --git a/Sequence.md b/Sequence.md
@@ -1,25 +0,0 @@
-# Sequence
-
-U2.4.1
-
-## Notes
-
-**Definition:** Sequences are ordered lists mapped to by the integers.
-
-To define a sequence we can use the following notation where n is some arbitrary element:
-
-$a_n = 2n$
-
-This defines the mapping from the integers to the set of all even numbers.
-
-#### Arithmetic Sequence
-
-An arithmetic sequence is a sequence where we start from some constant and then add d times the current value.
-
-This can be explicitly stated as $a_n = a + dn$ where d is some constant.
-
-#### Geometric Sequence
-
-In a geometric sequence we multiply the initial term by the common ratio, defined as r, to the nth power.
-
-A geometric sequence can be stated as $a_n=ar^n$ where r is some constant and n is the iteration, as always.
diff --git a/Set.md b/Set.md
@@ -1,31 +0,0 @@
-# Set
-
-U 2.1.1
-
-## Notes
-
-**Definition:** A set is an unordered list of elements.
-
-Common Sets:
-
-$\N$ - Natural Numbers
-
-$\mathbb{W}$ - Whole Numbers (0 + Natural Numbers)
-
-$\Z$ - Integers
-
-$\mathbb{Q}$ - Rational Numbers ($\frac{a}{b}, b \neq 0$) 
-
-$\R$ - Real numbers (not imaginary)
-
-$\mathbb{C}$ - Complex numbers (of the form a + bi)
-
-## Specifications
-
-Set specification types:
-
-1. Roster - list of all members (discrete)
-2. Set Builder - specify all members by form (discrete or continuous)
-3. Interval Notation - specify set as [a,b) where a is inclusive and b is exclusive (continuous)
-
-Inverval notation is best for continuous sets where it is applicable, otherwise we generally use set builder notation unless there is a really small cardinality of the specified set.
diff --git a/SetFunction.md b/SetFunction.md
@@ -1,9 +0,0 @@
-# Set Function
-
-Stats CH1
-
-## Notes
-
-**Definition:** A set function is a function defined as u : X -> Y where X is a collection of sets and Y is anything. 
-
-Basically, a set function takes in a collection of sets (set of sets) and outputs a something that may be an element or set or whatever. In the context of stats it is often that the mu (greek u) takes in a subset of sets and outputs a probability of each set. 
diff --git a/SharedPointers.md b/SharedPointers.md
@@ -1,63 +0,0 @@
-# Shared Pointers 
-
-**Source:** [CPP Reference](https://en.cppreference.com/w/cpp/memory/shared_ptr)
-
-**Chapter:** N/A
-
-## Notes
-
-**Definition:** A shared pointer is a pointer that keeps a reference counter so when the final reference to it goes out of scope, the memory will be freed.
-
-The value of this is unlike unique_ptr, copies of the pointer can be made. The drawback of this is the overhead both in memory and computation associated with keeping track of the number of pointers that point to the object.
-
-These should be used when defining your own class to ensure once all pointers to objects of the class type go out of scope, the proper destructor is called.
-
-An example usage of shared pointers is as follows:
-
-```cpp
-
-#include "iostream"
-#include "memory"
-
-class arr{
-	public:
-		int* array = nullptr;
-
-		~arr(){
-			if(array != nullptr){
-				delete array;
-			}
-		}
-};
-
-
-std::shared_ptr<arr> getSharedPtr(){
-	
-	auto sharedPtr = std::make_shared<arr>();
-	int* arr = new int[10];
-	sharedPtr->array = arr;
-	for(int i = 0 ; i < 10; ++i){
-		arr[i] = i;
-	}
-	return sharedPtr;
-}
-
-int main(){
-	std::cout << "Testing Shared Pointer" << std::endl;
-	while(true){
-		std::shared_ptr baseShared = getSharedPtr();
-		for(int x = 0 ; x < 100 ; ++x){
-			auto shared = baseShared;
-			for(int i = 0; i < 10; ++i){
-				std::cout << shared->array[i] << " ";
-			}
-		}
-		std::cout << std::endl;
-	}
-
-	return 0;
-}
-
-```
-
-In this usage we call a method to create an integer array allocated on the heap. Since make_shared does not support raw arrays, we encapsulated it in a class. We then return this to the main method and show that we can assign pointers to point at the same memory and do manipulations. We then see since it is an infinite loop, it deallocates memory once the last pointer goes out of scope because otherwise the above code would leak memory.
diff --git a/Shear.md b/Shear.md
@@ -1,21 +0,0 @@
-# Shear (transformation)
-
-3B1B
-
-## Notes
-
-**Definition:** A shear is a type of linear transformation where one axis is 'slid' while the other reamins the same. 
-
-The following is the form of a shear in R^2 where the x values are scaled and the y value stays the same:
-
-(Horizontal)
-
-[1 k]
-[0 1]
-
-Note that k is an non-zero value.
-
-(Vertical)
-
-[1 0]
-[k 1]
diff --git a/SignedExtension.md b/SignedExtension.md
@@ -1,12 +0,0 @@
-# Signed Extension
-
-W1
-
-## Notes
-
-**Definition:** Signed extension is used to extend the size of a signed value.
-
-
--3:
-
-101 -> 11111101
diff --git a/SimilarityFeature.md b/SimilarityFeature.md
@@ -1,9 +0,0 @@
-# Similarity Feature
-
-ML 4
-
-## Notes
-
-**Definition:** A similarity feature is an added feature that describes how similar some feature is to a particular landmark. This value generally ranges from 1 being the same to nearly or exactly 0 (depending on RBF used) being entirely different.
-
-With housing data, as an example, we may use an RBF to add another feature based on lat and long to see how far away points are from some landmark city. 
diff --git a/SimpsonsParadox.md b/SimpsonsParadox.md
@@ -1,19 +0,0 @@
-# Simpson's Paradox
-
-Ch 1.1
-
-## Notes
-
-**Definition:** Simpson's paradox is the seeming paradox that some outcome can be overall more common despite all individual cases making it seem less likely.
-
-Consider the case of some batters and batting averages shown below:
-
-	Batter 1 | Batter 2
-
-2020	  .4		 .3
-2021	  .5		 .49
-2022      .2		 .19
-
-One can see that the second batter did worse in every season, but if the number of at bats in the 2021 season was far higher for batter 2 than batter 1 then it is possible their overall batting average was higher despite them having a lower percentage in each season.
-
-There are examples of this in MIT's admissions (more females apply to more difficult majors thus they have lower admission rates) as well as some airline stuff and sports related things signifying that it is not super uncommon and should be something to look out for.
diff --git a/SinglyLinkedList.md b/SinglyLinkedList.md
@@ -1,56 +0,0 @@
-# Singly Linked Lists
-
-CS 221 W11 Lecture 13. 
-
-## Notes
-
-**Definition:** Singly linked lists are lists that only contain pointers to the next item in the list. This is in contrast with [[DoublyLinkedList.md]] which have a pointer forward and backward.
-
-There is a pointer that needs to point to the head and then finding every subsequent element is as simple as iterating through the list. The final item in the list contains a null pointer. 
-
-Additionally, there are no cycles ([[Graphs.md]]) in the list hence that makes them a "tree".
-
-Inserting at the start is done as follows:
-
-1. Create new node on the head (use the new keyword).
-2. Point new node's pointer to the old start.
-3. Point the head pointer (used to reference the list) to the new head
-
-Removing first element (this is more complex in c/c++ because of memory management):
-
-1. Create a pointer that will keep track of the node being removed
-2. Point the head pointer to the next start
-3. Deallocate the node that was removed in step 1.
-
-
-Insert into arbitrary position: 
-
-1. Walk the list until reaching the point - 1 (done so insertion for position 3 is at 3 not 4) 
-2. Make new node
-3. Create pointer that points to the next node in relation to the current node
-3. Point the pointer on the current node to the newly created node
-4. Point the pointer on the new node to the node that was previously referenced by the last node (tempPointer)
-
-Removing arbitrary element:
-
-1. Walk list until reaching the index -1
-2. Create temp pointer with reference to the node that will be deleted
-3. Point the current node's pointer to the pointer for the node after the deleted one
-4. Deallocate the node marked for deletion 
-
-## Efficiency Information:
-
-Worst Case:
-
-Add to end: O(n)
-Remove from end: O(n)
-Traverse to end: O(n)
-
-These are worst case scenarios where you are searching, removing, and adding to the end of the list as it needs to be wholly traversed in such a case. 
-
-Best Case:
-
-Add to start: O(1)
-Remove from start: O(1)
-Traverse to first: O(1)
-
diff --git a/Singularity.md b/Singularity.md
@@ -1,7 +0,0 @@
-# Singularity
-
-Superintelligence - Bostrom
-
-## Notes
-
-**Definition:** The singularity is a future point in time where tech growth becomes uncontrollable and irreversible.
diff --git a/SkeletalAnimation.md b/SkeletalAnimation.md
@@ -1,27 +0,0 @@
-# Skeletal Animation
-
-CG W14 L2
-
-## Notes
-
-**Definition:** The animation of bones.
-
-Bones are the most primitive components of 3d object rendering and they have a tip and root to denote directionallity. The body is the area between the tip and root.
-
-Additionally, bones are rigid, but they can rotate about their local y-axis, but they have no other degrees of freedom.
-
-A joint is another way to refer to a root or tail as that is where bones are joined together (join -> join+t)
-
-Sometimes, an [[Armature.md]] will be disjoint, but we should generally try not to do this.
-
-## Steps to Implement
-
-1. Create [[Armature.md]] 
-2. Add bones (root bottom, tip top, body middle, and connected by joints)
-	- Start with one bone then extrude the rest of them
-	- When extruding, the root of the new bone will create a joint with the tip of the prior one
-		- When extruding a bone from the tip of another there is a parent child relationship. As such, there can be multiple children for a single parent.
-	- Name all bones in an apt way (ehh, I guess there is some value to this). 
-3. Create [[Mesh.md]] 
-4. Embed Armature into mesh
-
diff --git a/SmallestCounterExample.md b/SmallestCounterExample.md
@@ -1,15 +0,0 @@
-# Smallest Counterexample
-
-Abstract Math 10.3. This is similar to [[Induction.md]] and [[StrongInduction.md]]
-
-## Notes
-
-**Definition:** Assume that the first element of a series is true and that not all other elements of the series are also true. We find the first element that is untrue denoted as $S_k$ and show that $S_{k-1}$ being true and $S_k$ being untrue is contradictory.
-
-**Steps:**
-1. Check that first statement $S_1$ is true
-2. Suppose not every statement $S_n$ is true
-3. Let k > 1 be the first instance where $S_k$ is false
-4. Show that $S_{k-1}$ being true and $S_k$ being false are contradictory
-
-
diff --git a/SoftmaxRegression.md b/SoftmaxRegression.md
@@ -1,9 +0,0 @@
-# Softmax Regression
-
-ML D3
-
-## Notes
-
-**Definition:** Softmax regression is the process of running linear regression for k classes for a sample and then using the softmax function to determine the probability of it being a member of each class.
-
-The softmax function is simply a function where you find the each element e^z, sum these values, and then divide the exponential of each element by the sum of all exponentials.
diff --git a/Span.md b/Span.md
@@ -1,11 +0,0 @@
-# Span
-
-**Source:** Linear Algebra Done Right
-
-**Chapter:** 2
-
-## Notes
-
-**Definition:** The span of (v_1, ..., v_m) is the set of all [LinearCombination](LinearCombination.md) of (v_1, ..., v_m).
-
-This may be all R^2, R^3, or some other vector space.
diff --git a/Sparse.md b/Sparse.md
@@ -1,11 +0,0 @@
-# Sparse
-
-Ch 4
-
-## Notes
-
-**Definition:** A sparse matrix is a matrix mostly containing zeroes.
-
-## Implementation
-
-It is sometimes useful to represent a sparse matrix as 3 arrays where the first defined the row, the second the column, and the third the value stored at the given location. This takes up less space and still keeps important information.
diff --git a/Stack.md b/Stack.md
@@ -1,17 +0,0 @@
-# Stack
-
-CS202 L14 / CS303 Ch 1
-
-## Notes
-
-**Definition:** This is a data structure that uses the lifo approach where you add to the top and remove from the top of the struct.
-
-push: add to stack
-
-peek: get top element. This can also be implemented by doing pop then pushing the result. 
-
-pop: remove from top
-
-This can be implemented as a [[SinglyLinkedList.md]]
-
-See [[Queue.md]] for information about the fifo implementation.
diff --git a/Stacking.md b/Stacking.md
@@ -1,13 +0,0 @@
-# Stacking
-
-ML D5
-
-## Notes
-
-**Definition:** Stacking is the idea that we should create a dedicated model to act as a voting machine for an ensemble of predictive models.  
-
-This is in contrast with soft and hard voting which does simple calculations to determine the output based on inputs from the outputs of predictors (I know lots of words).
-
-This models that does the final prediction is called a blender or meta learner and through the process of blending gives and output.
-
-A good way to do this is by training the model on the outputs of out of sample data for all prior models.
diff --git a/StandardBasis.md b/StandardBasis.md
@@ -1,9 +0,0 @@
-# Standard Basis
-
-**Source:** Linear Algebra Done Right
-
-**Chapter:** 2
-
-## Notes
-
-**Definition:** The standard basis is the [BasisOfSubspace](BasisOfSubspace.md) where each vector is made up of all 0's and one 1.
diff --git a/StandardDeviation.md b/StandardDeviation.md
@@ -1,13 +0,0 @@
-# Standard Deviation
-
-Stats D2
-
-## Notes
-
-**Definition:** This is the average difference between each value in a dataset and the mean of the dataset. 
-
-See also [[Variance.md]] which is the squared value. As such, to find the standard deviation of some random variable X we can do the following:
-
-std.dev = sqrt(var(X))
-
-Where var$(X) = \sum_x(x-E[X])^2p_X(x)$ 
diff --git a/StandardMatrix.md b/StandardMatrix.md
@@ -1,7 +0,0 @@
-# Standard Matrix
-
-Khan U2
-
-## Notes
-
-**Definition:** The standard matrix of a linear transformation is the matrix we multiply the input of the function by to obtain the mapping of the input.
diff --git a/Standardization.md b/Standardization.md
@@ -1,33 +0,0 @@
-# Standardization 
-
-ML CH2
-
-## Notes
-
-**Definition:** Standardization is the process of scaling values such that the value is equivalent to itself subtracing the mean and dividing by the standard deviation. 
-
-This is optimal in some cases as [[MinMaxScaling.md]] has issues with outliers. If there is one outlier that is much bigger than all other values the max will be very large thus squishing the range of most values to be low numbers which can effect the accuracy of models.
-
-See [[FeatureScaling.md]] for more.
-
-Sample implementation:
-
-```python
-
-# Get number columns
-df = df.select_dtypes(include=['number'])
-
-for i in df:
-    mean = df[i].mean()
-    std = df[i].std()
-    df[i] = (df[i] - mean) / std
-
-print(df)
-
-```
-
-## Probabilistic Interpretation
-
-Standardization is the process of mapping some arbitrary [[NormalDistribution.md]] onto the normal distribution centered at 0 with a standard deviation of 1. This can be done simply by subtracting the mean of the normal distribution from each element and then dividing the subsequent values by the average standard deviation.
-
-We do this because there is not a closed form solution to find the percentiles of a normal/gaussian distribution thus we use a lookup table which assumes the distribution is centered about 0 with a std. deviation of 1. This is all that is needed to fully describe a gaussian distribution. 
diff --git a/StateAnalysis.md b/StateAnalysis.md
@@ -1,25 +0,0 @@
-# State Analysis
-
-Ch 3
-
-## Notes
-
-**Definition:** State analysis, in the context of algorithms, is a strategy for computing the time complexity of an algorithm that analyzes the current state of the algorithm instead of describing each line of code and their associated complexity which becomes unruly as algorithms become more complex.
-
-### Steps
-
-1. Choose a variable that captures as much state as possible (state variable)
-2. Find an upper bound for the domain of the variable 
-3. Find upper bound for number of instructions at each state value
-4. Multiply results from steps 2 and 3 to find total complexity (worst case complexity)
-
-### Example (bubble sort)
-
-1. State variables are i and j 
-2. Upper bound for i is n+1 upper bound for j is also n thus n(n+1)
-3. 11 total statements thus 11 is worst case number of statements per iteration (this is based on implementation, assignment, etc.)
-4. At most 11n(n+1) = 11n^2 + 11n time complexity
-
-As can be seen above, this is not entirely accurate to what we would find by evaluating each line of code, but this is generally close enough and only off by some constant factor.
-
-One limit of this is when n=0 we find the above algorithm also performs 0 instructions. This is because we don't take into account the initial overhead of assignments and such as we only care about the bulk of the complexity that comes from the iteration/computation part of the algorithm.
diff --git a/StatisticalInference.md b/StatisticalInference.md
@@ -1,7 +0,0 @@
-# Statistical Inference 
-
-Ch 1.1
-
-## Notes
-
-**Definition:** Statistical inference is the process of using statistical findings to make predictions about future events (emphasis on future).
diff --git a/StatisticsAndProbability.md b/StatisticsAndProbability.md
@@ -1,160 +0,0 @@
-# Statistics and probability
-
-Links to Stats Notes
-
-## Questions I would like to answer
-
-1. How do I create linear regression models with formulas and why does it work?
-
-## Main Links
-
-Probability and Statistical Inference Hogg, Tanis:
-
-Chapter 1.1: 
-	- [[SampleSpace.md]]
-	- [[StatisticalInference.md]]
-	- [[Frequency.md]]
-	- [[RelativeFrequency.md]]
-	- [[ProbabilityMassFunction.md]]
-	- [[SimpsonsParadox.md]]
-	- [[RandomExperiment.md]]
-	- [[RandomVariables.md]]
-
-Chapter 1.2:
-	- [[Event.md]] 
-	- [[SetFunction.md]]
-
-Chapter 1.3:
-	- [Permutation](Permutation.md) 
-	- [BinomialCoefficient](BinomialCoefficient.md)
-	- [OrderedSample](OrderedSample.md)
-	- [Binomial](Binomial.md)
-	- [DistinguishablePermutation](DistinguishablePermutation.md) 
-	- [MultinomialCoefficient](MultinomialCoefficient.md)
-
-Chapter 1.4:
-	- [ConditionalProbability](ConditionalProbability.md)
-
-Chapter 1.5:
-	- [IndependentEvents](IndependentEvents.md)
-	- [PairwiseIndependence](PairwiseIndependence.md)
-	- [MutuallyIndependent](MutuallyIndependent.md)
-
-Chapter 1.6:
-	- [BayesTheroem](BayesTheroem.md)
-	- [PriorProbability](PriorProbability.md) 
-	- [PosteriorProbability](PosteriorProbability.md)
-
-Chapter 2.1:
-	- [RandomVariables](RandomVariables.md)
-	- [ProbabilityMassFunction](ProbabilityMassFunction.md)
-	- [DiscreteRandomVariable](DiscreteRandomVariable.md)
-	- Support (space of X)
-	- HypergeometricDistribution
-
-
----
-
-[[Probability.md]]
-[[SetFunction.md]]
-[[MonotonicFunction.md]]
-[[ProbabilityDensityFunctions.md]]
-[[BinomialDistribution.md]]
-[[PoissonDistribution.md]]
-[[ExponentialDistribution.md]]
-[[NormalDistribution.md]]
-[[Variance.md]]
-[[ConditionalProbabilities.md]]
-[[JointProbability.md]]
-[[MarginalProbabilities.md]]
-[[Covariance.md]]
-[[Correlation.md]]
-[[Quantile.md]]
-[[ExploratoryDataAnalysis.md]]
-[[DensityEstimation.md]]
-[[Bandwidth.md]] 
-[[Oversmooothing.md]] 
-[[Undersmoothing.md]] 
-[[Boxplots.md]]
-[[Crosstabulation.md]]
-[[MosaicPlot.md]]
-[[BayesianInference.md]]
-[[Individuals.md]]
-[[Variables.md]]
-[[Pictograph.md]]
-[[StemAndLeafPlot.md]]
-[[Percentile.md]]
-[[CumulativeRelativeFrequency.md]]
-[[IQR.md]]
-
-PSA&AP MIT:
-
-L1:
-	- [[SampleSpace.md]]
-	- [[Complement.md]]
-	- [[DiscreteUniformLaw.md]]
-	- [[UniversalSet.md]]
-	- [[DisjointSet.md]]
-	- [[ProbabilityLaw.md]]
-L2:
-	- [[ConditionalProbabilities.md]]
-	- [[BayesTheroem.md]]
-	- [[TotalProbabilityTheroem.md]]
-	- [[ConditionalProbabilityTheroem.md]]
-L3:
-	- [[Independence.md]]
-L4:
-	- [[BinomialCoefficient.md]]
-L5:
-	- [[RandomVariables.md]]
-	- [[ProbabilityMassFunction.md]]
-L6:
-	- [[ProbabilityMassFunction.md]]
-	- [[Expectation.md]]
-	- [[Variance.md]]
-	- [[StandardDeviation.md]]
-	- [[JointProbability.md]]
-L7:
-	- Review
-L8:
-	- [[ProbabilityMassFunction.md]]
-	- [[ProbabilityDensityFunctions.md]]
-	- [[Standardization.md]]
-	- [[CumulativeDensityFunction.md]]
-	- [[MixedRandomVariable.md]]
-	- [[NormalDistribution.md]]
-	- [[BernoulliRandomVariable.md]]
-L9:
-	- [[JointDensityFunction.md]]
-L10:
-	- [[DerivedDistribution.md]]
-L11:	
-	- [[Covariance.md]]
-	- [[CorrelationCoefficient.md]]
-L12:
-	- [[IteratedExpectations.md]]
-L13:
-	- [[BernoulliProcess.md]] - Discrete memoryless
-	- [[MarkovChains.md]] - Discrete remembers
-	- [[PoissonProcess.md]] - Continuous memoryless
-L14:
-	- [[PoissonProcess.md]]
-	- [[BernoulliProcess.md]]
-L15:
-	- Skipped this as it was poisson part 2
-L16:
-	- [[MarkovChains.md]]
-	- [[MarkovProcess.md]]
-L17:
-	- [[MarkovChains.md]]
-	- [[PeriodicChain.md]]
-L18:
-	- Skipped markov part 3
-L19:
-	- [[LawOfLargeNumbers.md]]
-	- [[RegressionToTheMean.md]]
-	- [[MarkovInequality.md]]
-L20:
-	- [[CentralLimitTheroem.md]]
-
-The rest of the lectures discuss inference. I am not planning to read this as it will be covered in my Elements textbook.
diff --git a/StemAndLeafPlot.md b/StemAndLeafPlot.md
@@ -1,7 +0,0 @@
-# Stem and Leaf Plot
-
-Khan
-
-## Notes
-
-**Definition:** In a stem and leaf plot we have the left side where there is a stem and the right side where there is a leaf. The stem is the base value, as an example 1 and the right is a list of instances where the variable is some value in the range as an example 9. This element would mean there was some instance with a value of 19. 
diff --git a/StirlingsFormula.md b/StirlingsFormula.md
@@ -1,7 +0,0 @@
-# Stirling's Formula
-
-Ch 3
-
-## Notes
-
-**Definition:** Stirling's formula is a closed form approximation for factorials. 
diff --git a/StochasticAlgorithm.md b/StochasticAlgorithm.md
@@ -1,9 +0,0 @@
-# Stochastic Algorithm (Stochastic Optimization)
-
-ML CH2
-
-## Notes
-
-**Definition:** A stochastic algorithm is an optimization algorithm that uses randomness. 
-
-One example of this is [[KMeans.md]] which picks random cluster centroids.
diff --git a/StratifiedSampling.md b/StratifiedSampling.md
@@ -1,11 +0,0 @@
-# Stratified Sampling
-
-ML CH2
-
-## Notes
-
-**Definition:** Stratified sampling is the process of selecting samples based on the likelihood of samples being from strata.
-
-This is often used when there are smaller sample sizes that can't guarantee an accurate representative sample for testing and training data. We then define some strata and try to ensure accurate representation from each grouping to get more generalizable data.
-
-When you do sampling to make sure you get the correct ratios of data from each stratum this is called proportionate allocation whereas there is also optimum allocation or disproportionate allocation where we try to minimize variance (deviation). 
diff --git a/String.md b/String.md
@@ -1,9 +0,0 @@
-# String
-
-W2
-
-## Notes
-
-**Definition:** A string is a collection of ordered characters.
-
-C style strings are strings that contain n+1 indeces where the n+1th byte is all zeroes. 
diff --git a/StrongAI.md b/StrongAI.md
@@ -1,7 +0,0 @@
-# Strong AI
-
-Superintelligence - Bostrom
-
-## Notes
-
-**Definition:** Strong AI is an AI system that has very broad intelligence.
diff --git a/StrongInduction.md b/StrongInduction.md
@@ -1,21 +0,0 @@
-# Strong Induction
-
-Abstract Math 10.2. Weak induction is the normal form of induction discussed in [[Induction.md]] 
-
-## Notes 
-
-**Definition:** Strong induction is the process of proving one or more prior true statements implies a later one much like weak induction, but with strong induction we can prove in the form of $S_{k-5} \implies S_{k+1}$ so long as k-5 is in the domain and that every value between k-5 and k+1 has been shown to be true. 
-
-
-Steps:
-
-
-1. Prove the first statement $S_1$ or more if needed. 
-2. Given any integer k$\geq$ 1, prove $(S_1 \wedge S_2 \wedge S_3 \wedge ... \wedge S_k) \implies S_{k+1}$
-
-A good example of this is an equation that does not factor nicely. If I know that $S_1$ is true, but I can't factor $S_2$ in a satisfactory way to prove that for each n+1 the statement is true, then proving a few until finding an instance of something factoring well can solve this issue. 
-
-Can be used to prove [[FundamentalTheoremOfArithmetic.md]].
-
-
-
diff --git a/Subgraph.md b/Subgraph.md
@@ -1,7 +0,0 @@
-# Subgraph
-
-Ch 4
-
-## Notes
-
-**Definition:** A subgraph of G(V,E) is a graph H(W,F) such that W is in V and F is in E.
diff --git a/Subsequence.md b/Subsequence.md
@@ -1,7 +0,0 @@
-# Subsequence
-
-Ch 6.2
-
-## Notes
-
-**Definition:** A subsequence is a selection, or all, elements of a sequence kept in order.
diff --git a/Subset.md b/Subset.md
@@ -1,9 +0,0 @@
-# Subset
-
-U 2.1.2
-
-## Notes
-
-**Definition:** The set A is a subset of B which means all elements of A are in B.
-
-A **proper** subset is a subset where the two sets are not equivalent ($A \neq B$). This is described using $\sub$ instead of $\subseteq$ which included non-proper subsets.
diff --git a/Subspace.md b/Subspace.md
@@ -1,23 +0,0 @@
-# Subspace
-
-**Source:** Linear Algebra Done Right
-
-**Chapter:** 1
-
-## Notes
-
-### Linear Algebra Context
-
-**Definition:** A subspace is a subset of a vector space.
-
-To verify a subset U of a vector space V is a subspace of V we only need to verify:
-
-1. Closed under addition
-2. Closed under scalar multiplication
-3. Additive Identity
-
-### ML Context
-
-**Definition:** A subspace is a lower dimensional space.
-
-Often we find that many higher dimensional points all reside in or near a similar lower dimensional subspace which is the basis for [[Projection.md]]
diff --git a/SubtractionRule.md b/SubtractionRule.md
@@ -1,7 +0,0 @@
-# Subtraction Rule
-
-Ch 6.1
-
-## Notes
-
-**Definition:** The subtraction rule (inclusion-exclusion principle) is the idea that the cardinality of the union of two sets is the individual cardinalities minus the elements in both sets (ensure not double counting).
diff --git a/SumOfGeometricSeries.md b/SumOfGeometricSeries.md
@@ -1,13 +0,0 @@
-# Sum Of Geometric Series
-
-Ch 6.1
-
-## Notes
-
-**Definition:** The sum of the geometric series is the formula to solve a sequence of the form ab^0 + ab^1 .... ab^n.
-
-The formula is as follows:
-
-$S_n = \frac{a(r^n-1)}{r-1}$
-
-Where we have S_n the sum, a the constant, r the base of the exponential (common ratio), and n the total number of iterations.
diff --git a/SumOfVectorSpaces.md b/SumOfVectorSpaces.md
@@ -1,18 +0,0 @@
-# Sum of Vector Spaces
-
-**Source:** Linear Algebra Done Right
-
-**Chapter:** 1
-
-## Notes
-
-**Definition:** The sum of two vector spaces is another vector space which is formed by all sums of vectors in both spaces (think combining each vector with every other vector).
-
-Note that the sum of vector spaces is not simply limited to two vector spaces and can be stated as follows for 3 vector spaces where V_1, V_2, V_3 are vector spaces as is S_1:
-
-S_1 = V_1 + V_2 + V_3
-
-### Other Information
-
-1. If U_1, U_2, U_3 are subspaces of V then so is U_1 + U_2 + U_3
-2. The smallest possible subspace that contains U_1, U_2, and U_3 is the sum of their vector spaces
diff --git a/SumRule.md b/SumRule.md
@@ -1,13 +0,0 @@
-# Sum Rule
-
-Ch 6.1
-
-## Notes
-
-**Definition:** The sum rule states that the total number of possible choices is the sum of all choices.
-
-Example:
-
-There are 5 ways to paint a fence and 3 ways to paint a wall. How many ways are there to paint fences and walls?
-
-8
diff --git a/SuperScalar.md b/SuperScalar.md
@@ -1,7 +0,0 @@
-# Super Scalar
-
-Computer Architecture L2
-
-## Notes
-
-**Definition:** Execute multiple instructions per cycle.
diff --git a/SupervisedLearning.md b/SupervisedLearning.md
@@ -1,11 +0,0 @@
-# Supervised Learning
-
-ML L1
-
-## Notes
-
-**Definition:** Training a model by giving it inputs and valid associated outputs.
-
-Most widely used form of model training.
-
-These desired outputs are referred to as labels. The inputs are referred to as instances.
diff --git a/SupportVectorMachine.md b/SupportVectorMachine.md
@@ -1,7 +0,0 @@
-# Support Vector Machine (SVM)
-
-ML L1
-
-## Notes
-
-**Definition:** Algorithm that allows for an infinite dimensional vector as an input.
diff --git a/SurfaceRepresentation.md b/SurfaceRepresentation.md
@@ -1,9 +0,0 @@
-# Surface Representation
-
-CS 331 W11 L2
-
-## Notes
-
-**Definition:** Modelling the surface of a continuous object in a discrete computing environment.
-
-To do this we use a [[Mesh.md]]. 
diff --git a/Surjective.md b/Surjective.md
@@ -1,9 +0,0 @@
-# Surjective 
-
-L2
-
-## Notes
-
-**Definition:** For a function to be surjective each value in the codomain must be mapped to at least once.
-
-Also known as **"onto"**
diff --git a/Symmetric.md b/Symmetric.md
@@ -1,9 +0,0 @@
-# Symmetric
-
-Ch 9.1
-
-## Notes
-
-**Definition:** A symmetric relation is a relation such that if xRy then yRx for all (x,y). 
-
-For a symmetric relation to imply it is reflexive, the domain and the codomain must be the same. 
diff --git a/SymmetricClosure.md b/SymmetricClosure.md
@@ -1,7 +0,0 @@
-# Symmetric Closure
-
-Ch 9.4
-
-## Notes
-
-**Definition:** A symmetric closure is the closure of some relation under some property such that if xRy then yRx.
diff --git a/SymmetricMatrix.md b/SymmetricMatrix.md
@@ -1,9 +0,0 @@
-# Symmetric Matrix
-
-Ch 2.2
-
-## Notes
-
-**Definition:** A symmetric matrix is a matrix whereby A = A^T. 
-
-When viewing a symmetric matrix we see that all values are mirrored across the diagonal that goes from top left to the bottom right of the matrix.
diff --git a/SystemsOfEquations.md b/SystemsOfEquations.md
@@ -1,7 +0,0 @@
-# Systems of Equations 
-
-Khan
-
-## Notes
-
-**Defintition:** Systems of equations are sets of equations that are to be solved together. 
diff --git a/TargetEncoding.md b/TargetEncoding.md
@@ -1,22 +0,0 @@
-# Target Encoding
-
-ML CH2
-
-## Notes
-
-**Definition:** Target encoding is the process of mapping some feature to a representative value that is calculated. 
-
-This is different than [[LabelEncoding.md]] as label encoding uses an arbitrary mapping instead of a representative one. 
-
-A simple way to do this would be to find the mean target value of a given feature label (group by) and then mapping the feature to this mean. This is simple, but is imperfect especially when there is not a lot of information for a specific label.
-
-Another way to do this is by using a weighted mean that takes into account the means of all other feature options as well. This is often done by finding the current option's mean, multiplying it by the number of occurrences of said option, then adding the overall mean multiplied by some [[Hyperparameter.md]] m. The final thing to do is to divide this value by the number of instances of this option added to m.
-
-Equation:
-
-$\frac{n* \text{option mean} + m* \text{overall mean}}{n+m}$
-
-
-## Issues
-
-The main issue with this approach is overfitting. When setting a parameter based on the target there is a higher likelihood that you will overfit the training data. 
diff --git a/Task.md b/Task.md
@@ -1,11 +0,0 @@
-# Task
-
-Ch 0
-
-## Notes
-
-**Definition:** A task is a function from I to O where I is the set of all valid inputs and O is the set of all valid outputs.
-
-In this definition we are using the formal definition of a function from math.
-
-This does not define how to go from input to output just the expected outputs.
diff --git a/Tautology.md b/Tautology.md
@@ -1,9 +0,0 @@
-# Tautology
-
-1.3.1
-
-## Notes
-
-**Definition:** A tautology is a statement that is always true.
-
-An example of a tautology is $p \vee \neg p$.
diff --git a/TemporalDifferenceLearning.md b/TemporalDifferenceLearning.md
@@ -1,9 +0,0 @@
-# Temporal Difference Learning
-
-L4
-
-## Notes
-
-**Definition:** Temporal difference learning is a reinforcement learning process where we update the estimate of being in any given state by using the discounted value of next steps.
-
-This is different than MC because it does not require us to finish the episode, instead we can rely upon other states to calculate our expected return.
diff --git a/Tensor.md b/Tensor.md
@@ -1,9 +0,0 @@
-# Tensor
-
-ML P626
-
-## Notes
-
-**Definition:** A tensor is a multidimensional array of any dimensionallity. 
-
-Tensors can be 0-dim (scalars), 1-dim (vectors), 2-dim (matrix), and higher dimensions as well.
diff --git a/Texture.md b/Texture.md
@@ -1,13 +0,0 @@
-# Texture
-
-CS 331 W11 Lecture 2
-
-## Notes
-
-**Definition:** The texture of an object is it's surface and how it looks.
-
-This is implemented in unity via the [[MeshRenderer.md]]
-
-Game engines implement [[Baking.md]] to hardcode this texture at the cost of accuracy when changing perspective and lighting. 
-
-See [[TextureMaps.md]] for more information about object texture rendering.
diff --git a/TextureMaps.md b/TextureMaps.md
@@ -1,9 +0,0 @@
-# Texture Maps
-
-CS 331 W11 / 2
-
-## Notes:
-
-**Definition:** Texture maps are used to control the look of the [[Texture.md]] associated with an object. Texture maps attempt to simulate real world 3d surfaces without the cost of computing many meshes. 
-
-
diff --git a/TheoryOfComputation.md b/TheoryOfComputation.md
@@ -1,18 +0,0 @@
-# Theory of Computation 
-
-## Links By Resource
-
-### Math 421 (Course Resources)
-
-- AutomataTheory
-    - [FiniteStateAutomata](FiniteStateAutomata.md)
-        - [DeterministicFiniteAutomata](DeterministicFiniteAutomata.md)
-        - [RegularExpressions](RegularExpressions.md)
-        - [RegularLanguages](RegularLanguages.md)
-        - [NonDeterministicFiniteAutomata](NonDeterministicFiniteAutomata.md)
-        - ContextFreeGrammars
-        - Alphabet
-    - PushdownAutomata
-    - TuringMachines
-- ComputabilityTheory
-- ComplexityTheory
diff --git a/TimeComplexity.md b/TimeComplexity.md
@@ -1,7 +0,0 @@
-# Time Complexity
-
-Ch 2
-
-## Notes
-
-**Definition:** Let A be an algorithm. The worst case, best case, or average case time complexity of A is the function f: N->N where f(n) is the max, min, or average number of instructions executed by the algorithm for all inputs of size n bytes.
diff --git a/TotalProbabilityTheroem.md b/TotalProbabilityTheroem.md
@@ -1,11 +0,0 @@
-# Total Probability Theorem 
-
-L2
-
-## Notes
-
-**Definition:** Total probability theorem states that the probability of some event is equal to the summed probability of each possible way for the event to occur.
-
-This is often done in the inverse where we have a bunch of conditions and want to find the probability of a given even occurring.
-
-**Reverses order of conditionals**
diff --git a/Tractable.md b/Tractable.md
@@ -1,9 +0,0 @@
-# Tractable
-
-U 2.3 
-
-## Notes
-
-**Definition:** A tractable problem is a problem that can be solved in polynomial time (reasonable amount of time).
-
-See also [[Intractable.md]].
diff --git a/TransTheoreticalModel.md b/TransTheoreticalModel.md
@@ -1,25 +0,0 @@
-# Trans-Theoretical Model of Behaviour Change
-
-W2
-
-## Notes
-
-**Definition:** This is a model that describes the process of enacting behavior changes.
-
-Stages:
-
-1. PreContemplation
-	- Not ready to make changes
-	- Might be down or defensive
-2. Contemplation
-	- Getting ready
-	- Intent to engage in next few months (up to 2 years)
-	- Know positives but might avoid action
-3. Preparation
-	- Ready to engage in action within 30 days
-	- Begin to take steps to integrate steps
-4. Action
-	- Doing the behavior
-	- Need to keep working hard to keep consistency
-5. Maintenance
-	- Behavior has been changed
diff --git a/TransferLearning.md b/TransferLearning.md
@@ -1,9 +0,0 @@
-# Transfer Learning
-
-ML CH1
-
-## Notes
-
-**Definition:** Transfer learning is the process of transferring knowledge from one task to another. 
-
-An example of this is training a model to reconstruct images of pets using self-supervised learning. Using this, we can then make the model into a classification modelf based on labelled data for different types of pets.
diff --git a/Transform.md b/Transform.md
@@ -1,23 +0,0 @@
-# Transform
-
-CS 331 W11 L2
-
-# Notes
-
-**Definition:** This gives an object's position, rotation and scale in 3d space. 
-
-In Unity, we have a left handed coordinate system as oppossed to the standard right handed universe. In this universe, we have y is up and down instead of the expected z axis. 
-
-Additionally, each game object has its own local coordinate system that moves with the object. In this way Z becomes forward, X becomes right, and y becomes up with respect to positive values on said axis. 
-
-The datatype used to represent position in 3d space is [[Vector3.md]]. Each x,y,z component is of datatype float. 
-
-See [[Quaternions.md]] for rotations.
-
-Object is transform while the datatype is Transform. 
-
-Transforms have the following public members:
-
-1. position (Vector3)
-2. rotation (Quaternion)
-3. scale (Vector3)
diff --git a/Transformations.md b/Transformations.md
@@ -1,9 +0,0 @@
-# Transformations
-
-Khan
-
-## Notes
-
-**Definition:** Transoformations are functions that take an input vector and output another vector.
-
-See [[LinearTransformation.md]] for a specific type.
diff --git a/Transitive.md b/Transitive.md
@@ -1,7 +0,0 @@
-# Transitive
-
-Ch 9.1
-
-## Notes
-
-**Definition:** A transitive relation holds the transitive property namely that if xRy and yRz then xRz for all x,y,z.
diff --git a/TransitiveClosure.md b/TransitiveClosure.md
@@ -1,9 +0,0 @@
-# Transitive Closure
-
-Ch 9.4
-
-## Notes
-
-**Definition:** A transitive closure is the closure of a relation under some property such that each element where there is a path from one to another is directly connected.
-
-This can be thought of as fully connecting any connected components.
diff --git a/Translate.md b/Translate.md
@@ -1,11 +0,0 @@
-# Translate
-
-CS331 W12 L2
-
-## Notes
-
-This is a method of Unity's Transform class that moves the GameObject by the distance specified with respect to the local coordinate system. 
-
-See [[Rotate.md]] for similar function for rotating based on local rotation. 
-
-Also see [[LocalScale.md]] for changing the local scale. 
diff --git a/Transpose.md b/Transpose.md
@@ -1,103 +0,0 @@
-# Transpose
-
-ML P627
-
-## Notes
-
-**Definition:** The transpose of a matrix is the matrix flipped over the diagnol by switching the rows and columns. 
-
-2 4 1    2 3 4
-3 7 2 -> 4 7 6
-4 6 3    1 2 3
-
-As you can see, the first value remains and across the top we have the first column.
-
-Additionally, the transpose of a vector is possible and will go from n x 1 to 1 x n.
-
-Example:
-
-	 [1 8 0]
-A  = [7 6 4]
-	 [2 1 6]
-	
-	  [1 7 2] 
-A^T = [8 6 1]
-	  [0 4 6]
-
-
-B = [1 2]
-	[3 4]
-
-B^T = [1 3]
-	  [2 4]
-
-C = [1 0 -1]
-	[2 7 -5]
-	[4 -3 2]
-	[-1 3 0]
-
-C^T = [ 1  2  4  -1]
-	  [ 0  7 -3   3]
-	  [-1 -5  2   0]
-
-Also, note that (C^T)^T = C.
-
-#### Determinant
-
-The determinant of a matrix's transpose is the same as the determinant prior to the transpose. |A| = |A^T|
-
-Example:
-
-B = [1 2]
-    [3 4]
-
-|B| = 4 - 6 = -2
-
-B^T = [1 3]
-	  [2 4]
-
-|B^T| = 4 - 6 = -2
-
-#### Product of transpose
-
-(AB)^T = B^T A^T
-
-Note: This can be scaled up with an arbitrarily large list of matricies.
-
-#### Sum of transpose
-
-Where C = A + B
-
-C^T = (A+B)^T = A^T + B^T
-
-This builds upon our knowledge of matrix addition and product of transpose knowledge.
-
-#### Inverse
-
-AA^-1 = I_n
-
-(AA^-1)^T = I_n^T = I_n = (A^-1)^T A^T
-
-Thus (A^-1)^T is the inverse of A^T. 
-
-Note: the transpose of the identity matrix is still itself as the diagonal does not change.
-
-#### Vector
-
-The transpose of a vector is a 1xn matrix where the original vector can be thought of as a nx1 matrix.
-
-A = [a_1]
-	[a_2]
-	[a_3]
-
-A^T = [a_1 a_2 a_3]
-
-#### Column Space
-
-The transpose of a matrix has a column space equivalent to the row space of the original matrix.
-
-C(A^T) = Row space(A)
-
-#### Null Space
-
-The null space of the transpose is called the left nullspace.
diff --git a/Tree.md b/Tree.md
@@ -1,26 +0,0 @@
-# Tree
-
-Abstract Math and CS202
-
-## Notes 
-
-**Definition:** Trees are connected graphs without cycles. 
-
-There is no implication about split numbers or anything of the sort, but something interesting is that in all cases it must be true that the number of edges is one less than the number of vertices. This can be proved through [[StrongInduction.md]].
-
-
-**Root:** This is a node that has no parents
-
-**Parent:** The node above a given node
-
-**Child:** A node below a given node
-
-**Leaf:** These are nodes without children
-
-**Subtree:** A subtree is a section of a tree that is based upon a new root node that cuts off everything above it.
-
-See [[LinkedLists.md]] as linked lists (when non-cyclic) are a form of tree.
-
-Also see [[BinaryTree.md]] for a specific tree type.
-
-Note: A graph with 0 or 1 nodes are both trees because there is a connection between all nodes.
diff --git a/TreeDiagram.md b/TreeDiagram.md
@@ -1,9 +0,0 @@
-# Tree Diagram
-
-Ch 6.1
-
-## Notes
-
-**Definition:** A tree diagram is a diagram that shows all possible choices (outcomes) along with their branching.
-
-Think of 2^n where we split into 2 paths n times as a horizontal diagram.
diff --git a/Triangulation.md b/Triangulation.md
@@ -1,9 +0,0 @@
-# Triangulation
-
-CS 331 W11 L2
-
-## Notes:
-
-**Definition:** To break a surface up into triangles.
-
-This is often used to create [[Mesh.md]] for [[SurfaceRepresentation.md]]. Triangulation represents a 3d object using 2d surfaces as the union of triangles. This is how we create "3d" objects. 
diff --git a/Trichotomy.md b/Trichotomy.md
@@ -1,17 +0,0 @@
-# Trichotomy
-
-CLRS 3.2
-
-## Notes
-
-**Definition:** Trichotomy is a property of real numbers such that for any two real numbers one of the following must be true:
-
-1. a < b
-2. b < a
-3. a = b
-
-More generally trichotomy is a term to express three way classification.
-
-#### Asympotic Notation
-
-The reason I created this note is because asymptotic notations can be compared namely with O(n) < O(n^2) and such, but they are not tricotometric (word?) as it is possible a functions runtime may oscilate thus not allowing for a proper comparision.
diff --git a/TripleProductExpansion.md b/TripleProductExpansion.md
@@ -1,9 +0,0 @@
-# Triple Product Expansion 
-
-Khan
-
-## Notes
-
-**Definition:** The triple product expansion states the combined cross product of three vectors a x (b x c) = b(a dot c) - c(a dot b)
-
-This is also known as lagrange's formula. 
diff --git a/TruePositiveRate.md b/TruePositiveRate.md
@@ -1,13 +0,0 @@
-# True Positive Rate (TPR) also known as recall and sensitivity
-
-ML CH3
-
-## Notes
-
-**Definition:** This is the ratio of positive instances that are correctly classified.
-
-As such, we have the following equation:
-
-recall = TP / (TP + FN)
-
-This takes the number of true positives and divides by the sum of all actually positives samples (TP + FN).
diff --git a/TruthSet.md b/TruthSet.md
@@ -1,7 +0,0 @@
-# Truth Set
-
-U 2.1.2
-
-## Notes
-
-**Definition:** The truth set of a function P(x) is the set of all elements of the domain such that P(x) is true.
diff --git a/Tuple.md b/Tuple.md
@@ -1,11 +0,0 @@
-# Tuple 
-
-Ch 1
-
-## Notes
-
-**Definition:** A tuple is an ordered list of elements (like a set but ordered). 
-
-A tuple in 2d is an ordered pair. 
-
-Often tuples are used to represent points in space.
diff --git a/TwosComplement.md b/TwosComplement.md
@@ -1,17 +0,0 @@
-# Two's Complement
-
-SS
-
-## Notes
-
-**Definition:** Two's complement is an implementation of negative numbers where a leading one and flipped bits are used to represent negative numbers.
-
-To do this calculation in either direction flip all bits and add 1. 
-
-As such we have 1 and -1 as follows:
-
-1 : 00000001
-
--1 : 11111111
-
-This solves the problem of having a negative zero and decreases the computational overhead of using [[OnesComplement.md]].
diff --git a/UVMaps.md b/UVMaps.md
@@ -1,13 +0,0 @@
-# UV Maps
-
-CG W13 L1
-
-## Notes
-
-**Definition:** A UV map is a function that takes a mesh and returns an image. This describes how to "color in" the mesh.
-
-UV maps take in a mesh and return the unwrapped mesh. 
-
-The relation between the faces of an unwrapped mesh is called the layout. 
-
-(Not ultra violet maps... :))
diff --git a/UnaryOperations.md b/UnaryOperations.md
@@ -1,9 +0,0 @@
-# Unary Operations
-
-SS
-
-## Notes
-
-**Definition:** Unary operations are operations that only take one input.
-
-These operations include increment, decrement, square root, etc.
diff --git a/Underfitting.md b/Underfitting.md
@@ -1,9 +0,0 @@
-# Underfitting
-
-ML CH1
-
-## Notes
-
-**Definition:** Using a model that is too simple to learn the underlying structure of data.
-
-See [[Overfitting.md]] for the inverse of this.
diff --git a/Undersmoothing.md b/Undersmoothing.md
@@ -1,7 +0,0 @@
-# Undersmoothing
-
-Stats D3
-
-## Notes
-
-**Definition:** Undersmoothing is when a bandwidth value that is too small is selected for the kernel bandwidth of a kde and by doing this is overfits the dataset.
diff --git a/Unicode.md b/Unicode.md
@@ -1,7 +0,0 @@
-# Unicode
-
-W2
-
-## Notes
-
-**Definition:** Unicode is a character encoding systems that uses two bytes to represent almost all characters across languages.
diff --git a/UniquePointers.md b/UniquePointers.md
@@ -1,40 +0,0 @@
-# Unique Pointers
-
-**Source:** [CPP References](https://en.cppreference.com/w/cpp/memory/unique_ptr)
-
-**Chapter:** N/A
-
-## Notes
-
-**Definition:** A unique pointer in c++ is a pointer that can not be copied and once out of scope, automatically deallocates associated memory.
-
-The value of this is that unlike shared pointers, it does not have any overhead beyond normal pointers. Additionally, once it goes out of scope, the memory is managed automatically. This is useful because we can then return things from a method and not have to worry about deallocation of the memory afterwards. An example of this is shown below:
-
-```cpp
-
-#include "memory"
-#include "iostream"
-
-std::unique_ptr<int[]> genPtrToArray(){
-    // the <int[]> portion of the left side is optional.
-	std::unique_ptr<int[]> unique = std::make_unique<int[]>(42);
-	for(int i = 0; i < 42; ++i){
-		unique[i] = i;
-	}
-	return unique;
-}
-
-int main(){
-	std::cout << "Testing unique pointers."<< std::endl;
-
-	std::unique_ptr<int[]> ptr = genPtrToArray();
-	for(int i = 0 ; i < 42; ++i){
-		std::cout << ptr[i] << " ";
-	}
-	std::cout << std::endl;
-	return 0;
-}
-
-```
-
-In the above code, it is evident that we don't need to call delete despite calling new in the function and by running this code we can verify that the int array allocated in memory does not get deallocated after being returned from the method because we are still able to print out the numbers 0-41 in the main method.
diff --git a/UnitVector.md b/UnitVector.md
@@ -1,36 +0,0 @@
-# Unit Vector (Normalized Vector)
-
-Khan
-
-## Notes
-
-**Definition:** A unit vector is any vector with length of 1. 
-
-It is true that ||u|| = 1 when u is a unit vector, in all cases.
-
-Additionally, it is common practice to add a hat to unit vectors. As such, a hatted vector implies length of 1.
-
-Commonly we use ihat, jhat, and khat in 3d space where they are each orthogonal and denote some interesting information like right, forward, and up either relative to some object or in global space.
-
-## Construction
-
-To construct a unit vector from any other vector (with same direction) do the following:
-
-1. Find length of v (sqrt(x_1^2 + x_2^2 + ... + x_n^2))
-2. Multiply each component by one over the length of v
-
-Example (sorry for formatting but not worth my time):
-
-[4]
-[7] = u
-[9]
-
-||u|| = sqrt(16 + 49 + 81)
-= sqrt(146)
-= 12.083
-
-[4]					[.331]
-[7] x (1/12.083) =  [.579]
-[9]					[.745]
-
- = $\hat u$ 
diff --git a/Unity.md b/Unity.md
@@ -1,43 +0,0 @@
-# Unity 
-
-Unity is a popular game engine, no duh. 
-
-## Notes
-
-### General Stuff
-
-**Unity Hub:** Used to manage projects and create projects. This can also be used to select IDE versions
-
-**Aggregation:** Unity uses aggregation by grouping assets together (nesting)
-
-### Important Conceptual Ideas
-
-- [Texture](Texture.md)
-- [TextureMaps](TextureMaps.md)
-- [Baking](Baking.md)
-- [SurfaceRepresentation](SurfaceRepresentation.md)
-- [Mesh](Mesh.md)
-- [Transform](Transform.md)
-- [Quaternions](Quaternions.md)
-
-### Components
-
-- [MeshFilter](MeshFilter.md)
-- [MeshRenderer](MeshRenderer.md)
-- [Script](Script.md)
-- [Animation](Animation.md)
-- [AnimationController](AnimationController.md)
-
-### Other Unity Specifics
-
-- [Asset](Asset.md)
-- [GameObject](GameObject.md)
-- [MonoBehaviour](MonoBehaviour.md)
-- [Lighting](Lighting.md)
-- [GameLoop](GameLoop.md)
-- [Vector3](Vector3.md)
-- [Input](Input.md)
-- [Movement](Movement.md)
-- [Translate](Translate.md)
-- [Rotate](Rotate.md)
-- [LocalScale](LocalScale.md)
diff --git a/UniversalSet.md b/UniversalSet.md
@@ -1,7 +0,0 @@
-# Universal Set
-
-L1
-
-## Notes
-
-**Definition:** The universal set either denoted by U or Omega is the set of all objects that are of interest in a particular context.
diff --git a/Universe.md b/Universe.md
@@ -1,13 +0,0 @@
-# Universe
-
-U 1.4.1
-
-## Notes
-
-**Definition:** The universe in math is the set of all objects that bear consideration. 
-
-Often we state the universe as the variable U.
-
-This is also sometimes called the domain, universe of discourse, or the domain of discourse.
-
-See also [[UniversalSet.md]] for the same concept. I created this note because the term bears remembering and I forgot what I called the universal set in the domain of stats and probability. 
diff --git a/Unsolvable.md b/Unsolvable.md
@@ -1,9 +0,0 @@
-# Unsolvable
-
-U 2.3
-
-## Notes
-
-**Definition:** Unsolvable problems are problems that can't be solved in even exponential time.
-
-A well known example of an unsolvable problem is the halting problem.
diff --git a/UnstableGradients.md b/UnstableGradients.md
@@ -1,17 +0,0 @@
-# Unstable Gradients
-
-ML 550
-
-## Notes
-
-**Definition:** Unstable gradients are the idea that different layers of a neural network can learn at widely different rates.
-
-This often manifests as [[ExplodingGradients.md]] or [[VanishingGradients.md]]
-
-This was a reason that deep neural networks were mostly abandoned in the early 2000s until there were revisions to model architecture. It was found that the initialization scheme of a normal weight distribution about 0 with a std deviation of 1 and the use of sigmoid activation functions caused this issue. Mainly the sigmoid function as they backpropogate gradients that are generally very small.
-
-To resolve this issue we need to ensure the variance of inputs and outputs are roughly equal. This can be done through a different initialization strategy called He initialization which uses ReLU.
-
-There is also another solution using LeCun initialization with a SeLU activation function.
-
-The final common approach, used with softmax activation, is to us the Glorot initialization method.
diff --git a/UnsupervisedLearning.md b/UnsupervisedLearning.md
@@ -1,13 +0,0 @@
-# Unsupervised Learning
-
-ML L1
-
-## Notes
-
-**Definition:** Given a dataset with no labels, find some structure in the underlying data. 
-
-[[ClusteringAlgorithms.md]] are often created using unsupervised learning.
-
-Another example of unsupervised learning is the cocktail party problem where you have multiple microphones in a room that is noisy, how do you separate out individual voices?
-
-See [[UnsupervisedPretraining.md]] for information about unsupervised training followed by supervised training.
diff --git a/UnsupervisedPretraining.md b/UnsupervisedPretraining.md
@@ -1,11 +0,0 @@
-# Unsupervised Pretraining
-
-ML P576
-
-## Notes
-
-**Definition:** Unsupervised pretraining is the process of pretraining a model on unlabeled data and then adding layers on top of the model using labelled data to get predictions.
-
-This is often used because unlabeled data is often abundant, but labeled data is expensive.
-
-We can do this with GANs as well as [[Autoencoder.md]]. With autoencoders we train the autoencoder to compress the data and then reuse the lower layers of this autoencoder as the lower layers for a neural network. This is useful because autoencoders are good at finding representations of the data without the need for labeled data.
diff --git a/UtilityFunction.md b/UtilityFunction.md
@@ -1,7 +0,0 @@
-# Utility Function
-
-Ch 1
-
-## Notes
-
-**Definition:** A utility function is a function from E -> R where E is the set of events, R is the set of real numbers, and the mapping describes how good the event is.
diff --git a/VLIW.md b/VLIW.md
diff --git a/VacuousProof.md b/VacuousProof.md
@@ -1,7 +0,0 @@
-# Vacuous Proof
-
-U 1.7
-
-## Notes
-
-**Definition:** A vacuous proof is for proofs of the form if p then q where we then show that p is always false thus there is no need to evaluate for q.
diff --git a/ValueFunction.md b/ValueFunction.md
@@ -1,11 +0,0 @@
-# Value Function
-
-RL Ch 1
-
-## Notes
-
-**Definition:** The value function describes the overall expected reward for an agent.
-
-This includes a gamma term (discount factor) which is between 1 and 0 with 0 meaning future rewards don't mean anything and 1 meaning future rewards are equally as important as short term rewards.
-
-When evaluating this function we take gamma to the power of the term number (how many steps in future) it is associated with making a geometric sequence.
diff --git a/VandermondesIdentity.md b/VandermondesIdentity.md
@@ -1,9 +0,0 @@
-# Vandermonde's Identity
-
-Ch 6.4
-
-## Notes
-
-**Definition:** Vandermonde's identity is an identity that describes n+m choose k as a sum of all ways to select 0 of one and k of the other 1 of one and k-1 of the other and so on.
-
-$\binom{n+m}{k} = \sum^k_{i=0} \binom{n}{i} \binom{m}{k-i}$
diff --git a/VanishingGradients.md b/VanishingGradients.md
@@ -1,15 +0,0 @@
-# Vanishing Gradients
-
-ML 550
-
-## Notes
-
-**Definition:** Vanishing gradients is a neural network problem where lower levels (earlier hidden layers) have such small gradients that gradient steps make tiny changes and the model never converges upon an a good solution.
-
-This is a very common problem as most of the time gradients get smaller and smaller. As such, this problem is much more common than [[ExplodingGradients.md]] which primarly happens with RNNs.
-
-### Solutions
-
-Use ReLU and better weight initialization (not gaussian distribution with std deviation of 1).
-
-See [[UnstableGradients.md]] for more.
diff --git a/Variables.md b/Variables.md
@@ -1,7 +0,0 @@
-# Variables
-
-Khan
-
-## Notes
-
-**Definition:** Variables are characteristics that can in some way be measure, counted, or categorized.  
diff --git a/VariadicOperations.md b/VariadicOperations.md
@@ -1,9 +0,0 @@
-# Variadic Operations
-
-SS
-
-## Notes
-
-**Definition:** Variadic operations are operations that can take a varying number of inputs.
-
-Some examples of these include sum, min, and max which would type in arbitrary lenght arrays in certain languages.
diff --git a/Variance.md b/Variance.md
@@ -1,29 +0,0 @@
-# Variance
-
-Stats D2
-
-## Notes (Stats)
-
-**Definition:** The variance of samples is the average squared difference between each value and the mean. 
-
-This can be shown as follows for X:
-
-var$(X) = \sum_x(x-E[X])^2p_X(x)$ 
-
-For this it is paramount to understand that the multiplication by the weight goes outside of the squared area. 
-
-Shown above, find the difference between each value and the mean, square it to get a positive, and then sum the values. We then average it by multiplying by 1 over the cardinality of X.
-
-If we take the square root of the variance we then have the [[StandardDeviation.md]]
-
-Additionally, the std deviation, given our definition of variance, is equal to sqrt(var(X)) given that the variance of the random variable X is squared.
-
-Important: When referring to values with units the variance will be units^2 hence standard deviation is often better in this regard because it is simply units.
-
-## Notes (ML)
-
-**Definition:** Variance is error cause by an oversensitive model (sensitive to variance/outliers).
-
-These models are likely to overfit training data.
-
-Variance can be thought of as a models susceptibility to having vast differences based on training data differences. This is what is tested for when doing cross validation.
diff --git a/Vector.md b/Vector.md
@@ -1,53 +0,0 @@
-# Vector (C++)
-
-## Notes
-
-**Definition:** Vectors in c++ are dynamically allocated arrays that use the heap instead of the stack.
-
-Vectors are generally preferred to integer arrays because they can manage their own memory, be resized, and don't have to have a known size at compile time.
-
-Here is some code that illustrates the properties of them:
-
-```cpp
-
-#include "vector"
-#include "iostream"
-
-auto makeVec(){
-	std::vector<int> vec = std::vector<int>();
-	for(int i = 0 ; i < 1000; ++i){
-		vec.push_back(i);
-	}
-	return vec;
-}
-
-void sideEffectTest(std::vector<int>* vec){
-	for(int i = 0 ; i < 1000; ++i){
-		vec->at(i) = 1;
-	}
-}
-
-int main(){
-
-	for(int i = 0 ; i < 1000; ++i){
-		auto vec = makeVec();
-		for(int i = 0 ; i < 1000; ++i){
-			//std::cout << vec.at(i) << std::endl;
-		}
-	}
-
-	while(true){
-		auto vec = makeVec();
-		sideEffectTest(&vec);
-		for(int i = 0 ; i < 1000; ++i){
-			std::cout << vec.at(i) << std::endl;
-		}
-	}
-
-	return 0;
-}
-
-
-```
-
-As we can see, vectors can be returned and passed around by value (default) and when this is done side effects don't impact the original vector. If we pass by reference either with a pointer or by specifying the input parameter as std::vector<int>& (this is the preferred way), we can then make changes to the input vector while still having memory managed for us automatically, as shown in the final while(true) loop.
diff --git a/Vector3.md b/Vector3.md
@@ -1,9 +0,0 @@
-# Vector 3
-
-CS 331 W12 L3
-
-## Notes
-
-**Definition:** The Vector3 class in unity is used to represent x,y, and z coordinates in a singular object. This object stores each axis value as a float.
-
-See [[Movement.md]] for how to use Vector3s to move. 
diff --git a/VectorMatrixMultipication.md b/VectorMatrixMultipication.md
@@ -1,30 +0,0 @@
-# Vector Matrix Multiplication
-
-Khan
-
-## Notes
-
-**Definition:** Vector matrix multiplication can be performed by taking the combination of the first column of the matrix with the first top row of the vector and then repeating this throughout. 
-
-As described above:
-
-```
-
-[2 1] [c] = [2c + 1d] = cv + dw
-[4 2] [d]	[4c + 2d]
-
-
-```
-
-Where the vector [2,4] is v and [1,2]
-
-
-Let's do another one.
-
-```
-
-[2 4 8] [8] = [16 + 36 + 32] = [ 84]
-[8 9 0] [9]   [64 + 81 + 0 ]   [145]
-[7 8 1] [4]   [56 + 72 + 4 ]   [132]
-
-```
diff --git a/VectorSpace.md b/VectorSpace.md
@@ -1,30 +0,0 @@
-# Vector Space 
-
-**Source:** Linear Algebra Done Right
-
-**Chapter:** 1
-
-## Notes
-
-**Definition:** A vector space is a space where we find a closure under vector addition and scalar multiplication.
-
-Along with this, the following must be true:
-
-1. Commutative, a + b = b + a
-2. Associative, a(b * c) = b * (a * c) and a + (b + c) = b + (a + c)
-3. Additive Identity, a + 0 = a
-4. Additive Inverse, a + -a = 0
-5. Multiplicative Identity, 1a = a
-6. Distributive, a(u + v) = au + av and (a + b)u = au + bu
-
-## Related Information
-
-When defining a vector space we define it as a set $V$ along with an addition and scalar multiplication on $V$ that satisfies the prior properties.
-
-We define addition and scalar multiplication as functions.
-
-The addition function can be:
-a : (V)^2 -> B : f(n,m) = n+m for all n,m in V.
-
-The multiplication function can be:
-m : (V,F) -> V : m(v,f) = vf for all v in V and c in F.
diff --git a/Vertex.md b/Vertex.md
@@ -1,7 +0,0 @@
-# Vertex
-
-CG W13 L1
-
-## Notes
-
-**Definition:** A vertex is a point in 3d space. 
diff --git a/VigenereCipher.md b/VigenereCipher.md
@@ -1,7 +0,0 @@
-# Vigenere Cipher
-
-U 2.4
-
-## Notes
-
-**Definition:** Vigenere cipher is an polyalphabetic encryption scheme where we specify a key and then shift each element in the original message by the number represented by the character at the current location. When doing this we iterate through the key to ensure there is not one value doing the encrypting like with a Caesar Cipher.
diff --git a/VisualizationAlgorithm.md b/VisualizationAlgorithm.md
@@ -1,7 +0,0 @@
-# Visualization Algorithms
-
-ML Ch1
-
-## Notes
-
-**Definition:** Visualization algorithms are [[UnsupervisedLearning.md]] algorithms that output 2D or 3D representations of your data. 
diff --git a/VonNeumannModel.md b/VonNeumannModel.md
@@ -1,21 +0,0 @@
-# Von Neumann Model
-
-Computer Architecture L2
-
-## Notes
-
-**Definition:** Control signals are used to create a distinction between data and instructions in memory, but they are both saved together. Additionally, instructions are completed sequentially ie. finish one, fetch the next compute, etc. 
-
-This is our broad model for computing and computer architecture. Additionally, there is a single bus for memory (I would think having more could cause concurrency issues), a program control unit (control signals), and an arithmetic unit (cpu).
-
-Sequential instruction processing is ensured using a program counter that states what is being processed currently. 
-
-Alternatives listed in [[ForwardThoughts.md]] 
-
----
-
-Two key properties:
-
-Stored Program
-
-Sequential Instruction Processing
diff --git a/VotingClassifiers.md b/VotingClassifiers.md
@@ -1,13 +0,0 @@
-# Voting Classifiers
-
-ML D4
-
-## Notes
-
-**Definition:** Voting classifiers are ensembles of classification models that use each of their outputs to predict the final output.
-
-Assume you are ussing an SVM classifier, random forest, and logistic regression, the outputs of these may be computed and then whichever classification gets the most votes is decided to be the output. 
-
-This process aggregates the outputs of the individual models into one output. Majority voting is called hard voting where the most popular output is chosen. 
-
-The alternative to hard voting is soft voting which takes the average of probability outputs from each model to make a determination.
diff --git a/Walk.md b/Walk.md
@@ -1,7 +0,0 @@
-# Walk
-
-Ch 4
-
-## Notes
-
-**Definition:** A walk is a sequence of adjacent nodes where each node can appear multiple times.
diff --git a/WeakAI.md b/WeakAI.md
@@ -1,7 +0,0 @@
-# Weak AI
-
-Superintelligence - Bostrom
-
-## Notes
-
-**Definition:** Weak AI is an AI system that has very narrow intelligence (think chess bot)
diff --git a/Weight.md b/Weight.md
@@ -1,9 +0,0 @@
-# Weight (ANNs)
-
-ML D6
-
-## Notes
-
-**Definition:** Weights in ANNs are numerical values that represent the strength of connections between neurons and biases.
-
-The connection strengths are called kernels and the sum of these + biases is the total number of trainable parameters (weights).
diff --git a/WeightedGraph.md b/WeightedGraph.md
@@ -1,7 +0,0 @@
-# Weighted Graph
-
-Ch 4
-
-## Notes
-
-**Definition:** A weighted graph is a graph where we maintain a list of weights for edges to represent the cost of traversal.
diff --git a/WellDefined.md b/WellDefined.md
@@ -1,7 +0,0 @@
-# Well Defined
-
-1.3.2
-
-## Notes
-
-**Definition:** For an object to be well defined it must be unambiguous.
diff --git a/WellOrdered.md b/WellOrdered.md
@@ -1,13 +0,0 @@
-# Well Ordered
-
-Abstract Math Chapter 10
-
-## Notes
-
-**Definition:** A well order set has a definite smallest element. 
-
-This is important because it is the basis for [[Induction.md]] as without it, there would be no way to prove that $S_n\implies S_{n+1}$ means that for something is true for all values in the set. 
-
-A few examples of well ordered sets are $\N$, any known subset or provable subset of $\N$, the set {0,2,4,5646}, and infinitely many others. 
-
-Some examples of non-well ordered sets include $\R$, $\Z$, and $\mathbb{Q}$
diff --git a/WideAndDeepNN.md b/WideAndDeepNN.md
@@ -1,11 +0,0 @@
-# Wide and Deep Neural Network
-
-ML D6
-
-## Notes
-
-**Definition:** Wide and deep neural networks are a model architecture where some or all inputs are connected directly to outputs while also having a path through the neural network through hidden layers.
-
-By using a wide and deep neural network we don't worry about muddying simple relationships through the long path of a neural network as some values will be automatically factored into the outputs. 
-
-When doing this you use a concatenation layer prior to the output layer to combine both the long and short paths together.
diff --git a/Word.md b/Word.md
@@ -1,7 +0,0 @@
-# Word
-
-W1
-
-## Notes
-
-**Definition:** A word is the number of bits processed by a cpu this is typically 64/32. 
diff --git a/ZeroExtension.md b/ZeroExtension.md
@@ -1,9 +0,0 @@
-# Zero Extension
-
-W1
-
-## Notes
-
-**Definition:** Zero extension is the process of extending an unsigned integer to take up more bits but still maintain the same value.
-
-101 -> 00000101
diff --git a/ZeroOneMatrix.md b/ZeroOneMatrix.md
@@ -1,9 +0,0 @@
-# Zero-one Matrix
-
-Ch 9.3
-
-## Notes
-
-**Definition:** A zero one matrix is a boolean matrix where each index is either 0 or 1.
-
-When multiplying boolean matricies we can either do matrix multiplication and assume anything non-zero is one or we can do or comparisons (more inline with the philosophy of zero one matricies).
diff --git a/index.md b/index.md
@@ -1,30 +0,0 @@
-# Index
-
-This is the index for my main note classifications. I will maintain this as a home page.
-
-## Formal Schooling
-
-- [CS202](CS202.md) 
-- [CS331](CS331.md) 
-- [BIOL115](BIOL115.md) 
-- [HHP102](HHP102.md) 
-- [Math310](Math310.md) 
-- [Algorithms](Algorithms.md) 
-- [DiscreteMath](DiscreteMath.md) 
-- [Assembly](Assembly.md) 
-- [ComputerSecurity](ComputerSecurity.md) 
-- [TheoryOfComputation](TheoryOfComputation.md)
-
-## Other Focuses
-
-- [ComputerArchitecture](ComputerArchitecture.md) 
-- [MachineLearning](MachineLearning.md) 
-- [AISafety](AISafety.md) 
-- [StatisticsAndProbability](StatisticsAndProbability.md) 
-- [LinuxStuff](LinuxStuff.md) 
-- [LinearAlgebra](LinearAlgebra.md) 
-- [Calculus](Calculus.md) 
-- [Physics](Physics.md) 
-- [ReinforcementLearning](ReinforcementLearning.md) 
-- [DeepLearning](DeepLearning.md) 
-- [CPP](CPP.md)
diff --git a/rsync.md b/rsync.md
@@ -1,19 +0,0 @@
-# Rsync
-
-Notes on backups with rsync
-
-## Notes
-
-Rsync is the best way to backup a folder to another folder. This is especially useful when mounting another drive and then setting up a backup system to backup a folder to that drive. 
-
-Usage: rsync -av --delete /home /srv/backup
-
-The above command states to use archive mode to preserve permissions, v for verbose mode (optional), --delete says delete any files in the destination that don't exist in the source. We then specify the folder to sync and the location to sync it to.
-
-I have this aliased to backup in my bashrc with the following line:
-
-```bash
-alias backup='sudo rsync -av --delete /home/ /srv/backup'
-```
-
-it might make sense to remove the path stuff, but meh I don't plan on moving stuff around. Realistically, I might also want to create a systemd service to do this, but that seems like too much work for it to be worthwhile.  
diff --git a/sed.md b/sed.md
diff --git a/usubstitution.md b/usubstitution.md
@@ -1,7 +0,0 @@
-# U-Substitution
-
-Unit 2
-
-## Notes
-
-**Definition:** U-substitution is an integration technique whereby we attempt to revers the chain rule by finding u and du in an integral, substituting, and then evaluating.
diff --git a/work/deep-learning/PolarCoordinatesConversion.py b/work/deep-learning/PolarCoordinatesConversion.py
@@ -0,0 +1,24 @@
+# convert from polar coordinates to cartesian and vice versa
+import math
+
+def polarToCart(r, theta):
+    print("Polar Coordinates: \t" + str(r) + " " + str(theta))
+    x = math.cos(theta) * r
+    y = math.sin(theta) * r
+    print("Cartesian Coordinates: \t" + str(x) + " " + str(y))
+    return (x,y)
+
+def cartToPolar(x,y):
+    print("Cartesian Coordinates: \t" + str(x) + " " + str(y))
+    r = math.sqrt(x**2 + y**2)
+    theta = math.asin(y/r)
+    print("Polar Coordinates: \t" + str(r) + " " + str(theta))
+
+
+r = 2.3
+theta = 1.38
+print("POLAR TO CART:")
+x,y = polarToCart(r,theta)
+print()
+print("CART TO POLAR:")
+cartToPolar(x,y)
diff --git a/work/linear-algebra/01-20-2025.md b/work/linear-algebra/01-20-2025.md
@@ -0,0 +1,5 @@
+Nullspace:
+
+Given that the nullspace is the set of all vectors multiplied by a matrix that result in the zero vector, it seems reasonable to conclude that whenever our matrix spans R^n where n is the span of our matrix and the matrix is linearly independent, then we only have one vector, the zero vector, in the nullspace. Additionally, if our matrix is in R^n but only spans R^n-1 then there is a subspace of V with a basis of <0, 0, ..., 1> where there is only one coordinate that has a 1 and all the rest are zeroes. We know this because there is one dimension for which our matrix does not account when doing multiplication and as such, varying that value does not impact the final result thus everything else must be zero, but that coordinate that is unaccounted for will not impact the final result.
+
+Sometimes this is also referred to as the kernel.

	notes Unnamed repository; edit this file 'description' to name the repository.
	Log \| Files \| Refs