The references refer to the CFA text book.

2 Correlation Analysis

2.4 Limitations of Correlation Analysis

Linear vs Non Linear association

Correlation measures linear association between two variables, linear meaning “A straight-line relationship, as opposed to a relationship that cannot be graphed as a straight line.”

NB - Two variables can have a strong nonlinear relation and still have a very low correlation.

The following is an example of a graph with a STRONG NONLINEAR RELATIONSHIP

(Trends in the intensity of copper use in Japan since 1960. Juan Ignacio Guzma´na, Takashi Nishiyamab, John E. Tiltona)

Outliers

Correlation also may be an unreliable measure when outliers are present in one or both of the series. On the post of December 2, 2010, the following scatter graph was presented, based on BHP data. The calculated correlated for this data set was 0.54.

Reviewing the graph it appears that the 2009 year is an outlier, as it is the only year where Revenue & Earnings are no relatively close on the graph. This is potentially an outlier. The year 2009 is removed from the data set below, to see the impact on the correlation coefficient. It is predicted that the correlation will increase.

Practical example – BHP revenue vs Earnings per ordinary share - Correlation coefficient WITHOUT outlier.

The year 2009, as a perceived outlier, has been removed from the data set. The result below is that the correlation increases significantly

You can check your answers for other examples using the following web site: http://easycalculation.com/statistics/correlation.php

1. Data
Year	2010	2008	2007	2006
Revenue US $m	52798	59473	47473	39099
Earnings per ordinary share (diluted) (US sent)	227.8	274.8	228.9	172.4

2. Calculation
	Year	Revenue $	Dividends	Cross product	Squared deviations Revenue	Squared deviations Dividends
	2010	52798	227.80	77,487.97	8,923,363.84	672.88
	2008	59473	274.80	704,760.87	93,358,108.84	5,320.24
	2007	47473	228.90	-63,214.11	5,465,308.84	731.16
	2006	39099	172.40	315,569.63	114,742,659.24	867.89
Average		49710.75	225.975

*Covariance*	Sum			1,044,255.17	222,449,401	5,266
	(N-1)			3
	Answer			348,085.0583

*Variance*	Sum Squared deviations				222,449,491	5,266
	(N-1)				3	3
	Answer				74,149,800	1,755

*Standard deviation*					8,611	41.9


*Coefficient Correlation*	1. Covariance			348,085.058
	2. Standard deviation X Standard deviation			360,775.26
	Answer (1/2)			0.964825

Conclusion

Determine whether a computed sample correlation changes greatly by removing a few outliers.

In this example, by removing 2009, the correlation moved from a weak correlation to an almost perfect correlation!

But one must also use judgment to determine whether those outliers contain information about the two variables’ relationship (and should thus be included in the correlation analysis) or contain no information (and should thus be excluded).

Correlation does not imply causation. You might just be lucky!

Even if two variables are highly correlated, one does not necessarily cause the other in the sense that certain values of one variable bring about the occurrence of certain values of the other. Furthermore, correlations can be spurious in the sense of misleadingly pointing towards associations between variables