Regression & Prediction

Return to Behavioral Research Methods

When you want to use correlation to make a prediction, you have to use regression.

Prediction is estimating the value of a variable based on the value of another variable. The stronger the relationship between the variables, the more accurate the prediction. So, stronger correlations produce better predictions.

When doing prediction, the independent variable is called the predictor and the dependent variable is called the criterion.

Regression
Regression can be expressed as a number or a formula for a line. When expressed as a number, it is the same as the correlation coefficient (this is not  true when you do multiple regression). When expressed as a line, it is called the Regression Line (or Line of Least Squares).

The Regression Line is the line that passes through the data in the scatterplot with the least amount of distance between the line and all of the data points. This is determined by taking into account the squared distances between the data points and the line (hence, Line of Least Squares).

Standard Error of the Estimate
Notice that the Regression Line may not in fact pass through any of the points. Thats's the problem with prediction. It is never 100% accurate (unless you have a perfect correlation). Instead, the line will produce predictions that are most likely to occur.

When we make a prediction, it's like finding the mean of the possible values. The Standard Error of the Estimate tells us the range in which our prediction are most likely to fall. Think of this as the standard deviation for the Regression Line.


 * For example, we may say if you are 5 ft tall, then a healthy weight for you is 120.5 lbs. But if we want to be more accurate, we may say if you are 5 ft tall, then a healthy weight for you is between 104 and 137 lbs.

In other words, it describes the amount the prediction is 'off' on average. When the actual data points are close to the Regression Line, the Standard Error of the Estimate is small. You want it to be small because more accurate predictions can be made.

The r squared value
The strength of the prediction by the Regression Line is determined by the r squared value (or coefficient of determination). This value tells you the proportion of the variance in the criterion (dependent variable) accounted for by the predictor (independent variable). The bigger your r squared value, the stronger your prediction. The r squared value is the correlation coefficient squared.

Calculating the Regression Line
This is a four step process:


 * (1) Calculate the correlation coefficient.


 * Regression_slope_equation.jpg


 * (2) Calculate the slope of the Regression Line.


 * Regression_intercept_equation.jpg


 * (3) Calculate the y-intercept of the Regression Line.


 * Regression_line_equation.jpg

Regression in SPSS

 * (4) Calculate the Regression Line equation.


 * In the left hand box, highlight the column label for the Y values, then click the arrow to move the column label into the 'Dependent Variable' box
 * For one predictor variable, highlight the column label for the X values and click the arrow to move it into the 'Independent Variable' box
 * Click 'OK'
 * Your output should appear