Suppose we have variable following a distribution where . But we don't know , instead we have multiple observations of , like
let
and the mean is
We want to find a value of that minimises . The residual vector (in red) should thus be minimised. Thus, should be perpendicular to . Using Pythagoras theorem