Degree of freedom in statistical testing
In this Minitab Blog Post I found a clear explanation of the concept of degree of freedom (often abbreviated as ‘d.f.’) in the context of hat warning (!), one-sample t-test, Chi-square test, and linear regression.
A relevant key concept to understand about degree of freedom is constraint. In mathematics, a constraint is a condition of a problem that the solution must satisfy. Say if we want to compare whether the average height of ten people equals 1.65m. We can use an one-sample test for it. The null hypothesis is that the average height is indeed 1.65m (plus and minus sampling error), and the alternative hypothesis is that the average is a different value.
So under the null hypothesis, the constraint is that the average height must be 1.65m, or the sum of the height of ten people must be 16.5m. That means, the height of nine persons can vary as they wish, but the last person’s height must be such a value that the mean (or the sum) matches the average value against which we wish to test. That is the constraint of our problem.
What I find intriguing is that the sconstraint is in essence a human invention: it is because we care about a certain property, we set the constraint. Therefore, the degree of freedom in fact reflects our interest and intention in the data (or in the unobservable variables that generate the data).
If this is clear, then the degree of freedom of an one-sample t-test is clear: it is $ n-1 $ for a sample of $ n $ probes. By following the logic, we have following rules
- The degree of freedom is $m+n-2$ for two-sample t-tests with two groups of sizes $m$ and $n$, respectively.
- The degree of freedom of a $2\times2$ table, used in a Chi-square test of independence, has a freedom of 1, because the row sum and column sum are constraints. For a table of size $m\times n$, the degree of freedom is $(m-1)(n-1)$.
- The degree of freedom of a regression analysis is $n-p$, where $n$ is the number of data points to be regressed and $p$ is the number of parameters in the regression model.
More generally, in mathematics, degree of freedom can represent the number of dimensions of the domain of a random vector, or the number of free components, the number of components by knowing each we can fully determine the vector. Any in a dynamic system, degree of freedom specifies the number of independent ways the system can move, without violating any constraint imposed on it.