Variance.md (1280B)
1 # Variance 2 3 Stats D2 4 5 ## Notes (Stats) 6 7 **Definition:** The variance of samples is the average squared difference between each value and the mean. 8 9 This can be shown as follows for X: 10 11 var$(X) = \sum_x(x-E[X])^2p_X(x)$ 12 13 For this it is paramount to understand that the multiplication by the weight goes outside of the squared area. 14 15 Shown above, find the difference between each value and the mean, square it to get a positive, and then sum the values. We then average it by multiplying by 1 over the cardinality of X. 16 17 If we take the square root of the variance we then have the [StandardDeviation](StandardDeviation.md) 18 19 Additionally, the std deviation, given our definition of variance, is equal to sqrt(var(X)) given that the variance of the random variable X is squared. 20 21 Important: When referring to values with units the variance will be units^2 hence standard deviation is often better in this regard because it is simply units. 22 23 ## Notes (ML) 24 25 **Definition:** Variance is error cause by an oversensitive model (sensitive to variance/outliers). 26 27 These models are likely to overfit training data. 28 29 Variance can be thought of as a models susceptibility to having vast differences based on training data differences. This is what is tested for when doing cross validation.