Correlation

Return to Behavioral Research Methods

A measure used to describe a relationship between two variables. This description has two facets: magnitude and direction. A correlation can be positive or negative (direction). It can also be between strong, weak, or absent (magnitude).

There are two assumptions of correlations: The data must have a linear relationship (rectolinearity) and the relationship must be homoscedastic.

It is important to note that a correlation only describes the relationship between the variables. It does not  imply causation.

A correlation can be observed by using a scatterplot or a correlation coefficient.

Uses for Correlation
Correlations can be used for prediction, a validity estimate, reliability measure, and/or theory verification.

Validity Estimate
A correlation can determine if one variable accurately describes another variable.


 * For example, the Scholastic Aptitude Test (SAT) is supposed to measure a student's ability to perform in college. Correlation can be used to see if there is a relationship between SAT score and college GPA.

Reliability Measure
Determines the consistency of a measure.


 * For example, the scores from the first time a person takes an IQ test should be the same or similar to the scores the second time. If not, then the IQ test is not reliable because it does not produce the same results consistently.

Theory Verification
After a theory has been developed claiming how variables are related, a correlation can be used to determine if the theory is valid.


 * For example, imagine a theory stated that as people grow older, they become smarter. A correlation can be used to test if this theory is true.

Concomitant or Correlational
Describes how variables occur together. Meaning they are related to each other.


 * For example, height and weight have a concomitant relationship. Higher values of one are typically associated with higher values of the other.

Orthogonal or Independent
Variables are not related to one another.


 * For example, shoe size and writing skills have an orthogonal relationship. A person's shoe size is typically not associated with their writing ability.

Causal
One variable occurs when another variable occurs. They are related because one causes the other. This cannot be proven with correlation, only supported.


 * For example, brushing your teeth decreases your chances of having cavities. This is considered causal because brushing your teeth has to happen before the cavities.

Scatterplot
Also known as a scatter diagram or scattergram, this is a graph of the data points.

One variable is represented by the x-axis and the other variable is represented by the y-axis.

If the variables are related, the points will form a diagonal shape or linear line.

If the variables are not related, they will make a flat horizontal shape, a thin vertical shape, or no shape.

Rectolinearity
Scatterplots can help you determine what type of correlation coefficient to use. Most correlations require the data to be rectolinear (or in a straight line). Sometimes the data may be curvilinear (or in a curved line).

Homoscedasticity
Scatterplots are good for observing the homoscedasticity or homogeneity of variance in a relationship. Homoscedacity describes the spread of the data in the relationship. If there is homoscedasticity, then the data points will be spread evenly thoughout the relationship. If there is a lack of homoscedasticity (or it is heteroscedastic), then the data points have an uneven spread throughout the relationship.

Homoscedasticity is necessary for an accurate correlation. If it is not homoscedastic, then you should not use a correlation because the correlation will not be meaningful.

Correlation Coefficient
A mathematical expression of the degree of linear relationship between two variables. This is the number used to describe a correlation.

The correlation coefficent ranges from -1 to +1. The closer the correlation is to -1 or +1, the stronger the relationship between the variables. If the correlation is zero, that means there is no relationship between the variables.

The number indicates the strength (or magnitude) of the relationship and the sign (+, -) indicates the direction.

Calculating the correlation coefficient is a handful so a calculator or computer program is recommended. If you want to calculate it anyway, please feel free to do so.

There are many types of correlation coefficients. The type you use depends on your data.

Pearson's r
The most common correlation coefficient is the Pearson Product Moment Correlation Coefficient (or Pearson's r). This is the correlation coefficient used when your independent and dependent variables are in interval or ratio scales.

Spearman's r
This is the correlation coefficient you use when your independent and dependent variables are in an ordinal scale. It can also be used if you transform your variables into an ordinal scale.

It can also be used to measure a non-linear relationship and if you have extreme outliers. In order to use it for these situations, you have to transform your variables into ordinal scales. By doing this, it causes the variables to appear linear. Then you can see if there is a relationship.

Point Biserial Correlation
In situations where one variable is interval or ratio and the other variable is dichotomous (nominal with only 2 categories), you use a Point Biserial Correlation.

There are two methods for calculating a Point Biserial Correlation. You can either use the special formula OR you can transform your dichotomous variable into 1's and 0's. After transforming them to numbers, you can just calculate a Pearson's r!

Phi Coefficient
When both your independent and dependent variables are dichotomous (nominal with only 2 categories), use the Phi Coefficient.

This is usually used to determine the strength of a significant Chi Square.

It can be calculated by using the formula with the Chi Square value or you can transform both variables into 1's and 0's. After the transformation, use a Pearson's r. Using this method is not as useful as the Chi Square method.

Correlation in SPSS

 * 1) Click on 'Analyze' -> "Correlate' -> 'Bivariate'
 * 2) Move both of your variables into the 'Variables' box using the arrow button
 * 3) Check the appropriate box for your type of correlation under 'Correlation Coefficients'
 * 4) Check either 'Two-tail' or 'One-tail' under 'Test of Significance'
 * 5) Click 'Options' and check 'Means' and 'Standard Deviations' under 'Statistics' (this information will be useful when you have to explain the correlation)
 * 6) Click 'Continue'
 * 7) Click 'OK'
 * 8) Your output should appear

Factors That Affect the Correlation Size
There are a few things to consider when using a correlation.

Deflated Correlation
The size of your correlation can be "artificially deflated" (looks smaller than it really is) by the following factors:
 * 1) A curvilinear relationship
 * 2) The range of one or both variables is restricted - for example, compare age to years in school. Age can range from 0 to over 100 whereas years in school typically only ranges from 0 to 24
 * 3) The distribution of one or both variables are skewed - this causes the scatterplot to be heteroscedastic

Inflated Correlation
The size of your correlation can be "artificially inflated" (looks better than it really is) by the following factors:
 * 1) Sample contains subgroups with means that differ for both variables - for example, compare students' ratings of a U.S. President to the amount of financial aid they recieve. Although your sample is students, you will have different political affiliations so one subgroup could be the Democratic students and the other subgroup could be the Republican students. Students of one political affiliation may also receive more financial aid than students of the other political affiliation.
 * 2) Sample is comprised of extreme groups - for example, compare the amount of coffee people drink to their stress levels. Most people will either not drink coffee or drink lots of coffee.