"Statistics is vectors " ~ Joshua + a lot of statisticians
Introduction to the Geometry of Statistics
Why is population variance defined as
But sample variance is defined as
The key difference lies in the definition of and .
Population Mean
is the population mean — a fixed but possibly unknown constant
If is a random variable with a probability distribution, then
is assumed to be known in the population context. #Expectation
Sample mean
In practice, however, we often don’t know, especially in real-world data. So, we approximate it using the sample mean
where is a point estimate of the unknown based on a random sample.
Residuals
Residuals have an interesting property, it is expressed as
Now the sum of residuals is always .
Since sample mean , thus . Hence
Suppose we have two sets of data, lets express them in vectors or residuals #Residuals
Sample mean residuals
A degree of freedom is an independent value in your dataset that can vary freely when estimating a statistic.
Data can be split into two parts, residuals and sample mean.
let the residuals be vector , we know that the sum of the terms in is . Statistics#Residuals
Thus this limits what the residual could be. Suppose (2 data points).
We find that all possible values of lies on a line . Thus its degree of freedom is 1 (as it only moves across a 1d line)
Suppose .
The vector must lies on a 2d plane that satisfies . Thus the degree of freedom of is 2. Extrapolating this, has degree of freedom.
Sample Variance
As such, when calculating the variance of the residuals, since the degree of freedom of is , we divide by instead to find the average deviation or residual.
Population mean is known before calculating the value, it is thus independent of .
Thus, the residuals or vector can exist anywhere in the dimensional space, having degrees of freedom as its sum is not confined to .
Population Variance
We can also rewrite it as
Population Covariance
Why the covariance of two independent variables is 0
Recall the two independent vectors residual and . We can see that based on dot product rule.
Thus, if and are independent, they will not be similar (does not point in a similar direction), thus their dot product would be
Standard Deviation
Correlation Coefficient (A more elegant view of)
The formula (Pearson correlation coefficient) is given as such
Look at this monstrosity. But in a sense this is a rather simple formula.
Let's look at the formula again,
Notice how we can rewrite is as
The dot product determine the similarity of two vectors. If the vectors have a linear relationship, , then and would point at the same direction, then it will output . But if they are dissimilar (perpendicular) it will output . This thus explains why tells us the correlation between two variables as it is simply .
The coefficients of the best-fit for univariate functions can be expressed as, where is a Vandermonde Matrix. #Vandermonde_Matrix
Interpolation
The coefficients of a uni-variate function can be simply expressed as such as is a square matrix and has an inverse. (Only when ) where is the degree of the interpolant, and is the number of data the interpolant is supposed to pass through.
Where , and is the argument, outputs the argument that minimises the square of the residual. The above proves that the best function is obtained when we minimise the sum of the squared difference.