Sxx is formally defined as the sum of squared deviations of each data point from the mean. It is a measure of total variability in the independent variable (x). Dividing Sxx by (n-1) yields the sample variance:
[ s_x^2 = \fracS_xxn-1 = \frac\sum (x_i - \barx)^2n-1 ]
Thus, Sxx is the numerator of the variance formula. It captures the raw dispersion before scaling by degrees of freedom. A larger Sxx indicates greater spread of (x) values.
[ S_xx = \sum_i=1^n (x_i - \barx)^2 ]
Where:
import numpy as np
x = [4, 8, 6, 5, 3]
n = len(x)
sum_x = sum(x)
sum_x_sq = sum(xi**2 for xi in x)
Sxx = sum_x_sq - (sum_x**2)/n
variance = Sxx / (n-1)
print(f"Sxx = Sxx, Variance = variance")
Consider the dataset: (x = [2, 4, 6, 8]).
(\barx = 5),
(\sum (x_i - \barx)^2 = (2-5)^2 + (4-5)^2 + (6-5)^2 + (8-5)^2 = 9 + 1 + 1 + 9 = 20).
Using the shortcut: (\sum x_i^2 = 4 + 16 + 36 + 64 = 120), (\frac(\sum x_i)^2n = \frac20^24 = 100), so (S_xx = 120 - 100 = 20). Variance (= 20/3 \approx 6.67).
In simple linear regression (y = \beta_0 + \beta_1 x + \epsilon), Sxx is crucial for estimating the slope (\beta_1):
[ \hat\beta1 = \fracSxyS_xx ]
Where (S_xy = \sum (x_i - \barx)(y_i - \bary)). The standard error of the slope is: Sxx Variance Formula
[ SE(\hat\beta1) = \sqrt\fracs_e^2Sxx ]
Here, (s_e^2) is the residual variance. A larger (S_xx) reduces the standard error of the slope, improving the precision of the regression estimate. Intuitively, more spread in the predictor variable provides a stronger lever for estimating the relationship with the response variable.
In simple linear regression ( y = a + bx ):
[ b = \fracS_xyS_xx ] [ S_xy = \sum (x_i - \barx)(y_i - \bary) ] Sxx is formally defined as the sum of
Also, the standard error of the slope uses Sxx:
[ SE(b) = \sqrt\fracs_e^2S_xx ] where ( s_e^2 ) is the residual variance.
Thus Sxx measures the spread of x – larger Sxx → smaller standard error → more precise slope estimate.