## Worked Examples - Ch. 5

### Scatterplots and the correlation coefficient

#### Example

Suppose you were studying the educational level of husbands and wives
(measured in number of years of education). You have randomly selected
10 couples and have obtained the data in the following table:
By~Hand: The easiest formula for hand computation is
r\, =\frac{n\sum xy-(\sum x)\cdot (\sum y)}{\sqrt{\left( n\Sigma x^{2}-\left( \Sigma x\right) ^{2}\right) \cdot \left( n\Sigma y^{2}-\left( \Sigma y\right) ^{2}\right) }}

The correlation coefficient is
r=\frac{(10)(2656)-(164)(160)}{\sqrt{\left( (10)(2794)-(164)^{2}\right) \cdot \left( (10)(2608)-(160)^{2}\right) }}=0.45

The coefficient of determination, r^{2} is (0.45)^{2}
a=\frac{n\sum xy-\sum x\sum y}{n\sum x^{2}-(\sum x)^{2}}

#### y-intercept:

b=\frac{1}{n}\left( \sum y-a\sum x\right)

#### Regression line:

y=ax+b

The regression line for the above data:
a=\frac{n\sum xy-\sum x\sum y}{n\sum x^{2}-(\sum x)^{2}}=\frac{10\cdot 2656-164\cdot 160}{10\cdot 2794-(164)^{2}}=0.3065

b=\frac{1}{n}\left( \sum y-a\sum x\right) =\frac{1}{10}(160-0.3065\cdot 164)=10.97

y=0.3065x+10.97

We can use the regression line to make a prediction. For example,
if the husband had 10 years of education (x=10) then we could
predict that the wife had 0.3065\cdot 10+10.97=14 years of
education. The symbol,
\widehat{y}

is commonly used for predicted values,
thus, here
\widehat{y}=14.

Since for this example, r^{2}=0.20 , the model explains only
20% of the variation in education levels between husbands and wives.
It would be risky to try to predict the wife's education on the basis
of the husband's education.
### Error Analysis

We can get a feel for the error by comparing the actual values \#
to the values predicted by the regression line,
error=y-\widehat{y}

(i.e. the \textbf{actual value} minus the predicted value).
Since the sum of the errors is always 0, \Sigma error is not
helpful in measuring the error. Instead we use the sum of
the squares of the errors:
SSE=\sum (y-\widehat{y})^{2}