Suppose you were studying the educational level of husbands and wives (measured in number of years of education). You have randomly selected 10 couples and have obtained the data in the following table:
Husband  Wife 
12  14 
16  16 
16  14 
18  16 
20  16 
17  18 
23  18 
14  12 
12  16 
16  20 
To help us judge the degree of linear relationship between the two variables, we need to compute the correlation coefficient.
The correlation coefficient may be computed by hand or on a TI calculator or on EXCEL:
is the sum of the column,
is the sum of the column,
is the sum of the column,
is the sum of the column,
is the sum of the column,
The following table can be helpful in computing the sums:
x (husband)  y (wife)  x^{2}  y^{2}  x·y  
12  14  12^{2}=144  14^{2}=196  12·14=168  
16  16  16^{2}=256  16^{2}=256  16·16=256  
16  14  16^{2}=256  14^{2}=196  16·14=224  
18  16  18^{2}=324  16^{2}=256  18·16=288  
20  16  20^{2}=400  16^{2}=256  20·18=360  
17  18  17^{2}=289  18^{2}=324  17·18=306  
23  18  23^{2}=529  18^{2}=324  23·18=414  
14  12  14^{2}=196  12^{2}=144  14·12=168  
12  16  12^{2}=144  16^{2}=256  12·16=192  
16  20  16^{2}=256  20^{2}=400  16·20=320  





The coefficient of determination, r^{2} is (0.45)^{2}
(Note: The coefficient of determination gives the percent of the variation that is explained by the model.)
The correlation coefficient can be classified as:
weak/moderate/strong and positive/negative using the following guide from your text:
r = 0  no linear relationship 
0 < r < .5 or .5 < r < 0  weak (or low) linear relationship 
.5 £ r < .8 or .8 < r £ .5  moderate linear relationship 
.8 £ r < 1 or 1 < r £ .8  strong (or high) linear relationship 
r = 1 or r = 1  perfect (or exact) linear relationship 
Suppose that we have a husband with 10 years of education. We would like to predict the wife's education based on our data table.
We can do this by fitting a line through the data points. This line is called a regression line. We can use this line to make a prediction. If we have a strong linear relation, then our prediction should be very accurate.
(Note: In research, a problem occurs when subjects drop out of a study. As a result, data sets may be incomplete. The regression line can be used to "fill in" the incomplete information.)
Using the TI calculator, we can do several types of regression curves: linear, quadratic, cubic, quartic, logarithmic, exponential, and power. By hand, only the linear regression is feasible to do.
The easiest formulas to use by hand are as follows:
The regression line for the above data:
We can use the regression line to make a prediction. For example, if the husband had 10 years of education (x=10) then we could predict that the wife had 0.3065·10+10.97=14 years of education. The symbol,


We can get a feel for the error by comparing the actual values #
to the values predicted by the regression line,


The SSE for this data set is:
x  y 


 
12  14  14.651  .651  0.4238  
16  16  15.874  0.126  0.01588  
16  14  15.674  1.874  3.5119  
18  16  16.487  0.487  0.23717  
20  16  17.100  1.1  1.21  
17  18  16.181  1.819  3.3088  
23  18  18.020  0.02  0.0004  
14  12  15.261  3.261  10.634  
12  16  14.648  1.352  1.8279  
16  20  15.874  4.126  17.024  
SSE=38.2 