Worked Examples - Ch. 5

Scatterplots and the correlation coefficient

Example

Suppose you were studying the educational level of husbands and wives (measured in number of years of education). You have randomly selected 10 couples and have obtained the data in the following table: By~Hand: The easiest formula for hand computation is
r\, =\frac{n\sum xy-(\sum x)\cdot (\sum y)}{\sqrt{\left( n\Sigma x^{2}-\left( \Sigma x\right) ^{2}\right) \cdot \left( n\Sigma y^{2}-\left( \Sigma y\right) ^{2}\right) }}
The correlation coefficient is
r=\frac{(10)(2656)-(164)(160)}{\sqrt{\left( (10)(2794)-(164)^{2}\right) \cdot \left( (10)(2608)-(160)^{2}\right) }}=0.45
The coefficient of determination, r^{2} is (0.45)^{2}
a=\frac{n\sum xy-\sum x\sum y}{n\sum x^{2}-(\sum x)^{2}}

y-intercept:

b=\frac{1}{n}\left( \sum y-a\sum x\right)

Regression line:

y=ax+b
The regression line for the above data:
a=\frac{n\sum xy-\sum x\sum y}{n\sum x^{2}-(\sum x)^{2}}=\frac{10\cdot 2656-164\cdot 160}{10\cdot 2794-(164)^{2}}=0.3065
b=\frac{1}{n}\left( \sum y-a\sum x\right) =\frac{1}{10}(160-0.3065\cdot 164)=10.97
y=0.3065x+10.97
We can use the regression line to make a prediction. For example, if the husband had 10 years of education (x=10) then we could predict that the wife had 0.3065\cdot 10+10.97=14 years of education. The symbol,
\widehat{y}
is commonly used for predicted values, thus, here
\widehat{y}=14.
Since for this example, r^{2}=0.20 , the model explains only 20% of the variation in education levels between husbands and wives. It would be risky to try to predict the wife's education on the basis of the husband's education.

Error Analysis

We can get a feel for the error by comparing the actual values \# to the values predicted by the regression line,
error=y-\widehat{y}
(i.e. the \textbf{actual value} minus the predicted value). Since the sum of the errors is always 0, \Sigma error is not helpful in measuring the error. Instead we use the sum of the squares of the errors:
SSE=\sum (y-\widehat{y})^{2}