Mark Hulbert writing for MarketWatch has a nice article entitled, ‘Three reasons to give up your search for stock market predictors .”
His main point is that returns in one year have nothing to do with those of other years. His example case is that years ending in “5” are supposedly better than average years. But even if you look at the Dow, an index which has been around the longest, you have a sample of only 11 years which is not enough years to be statistically conclusive.
In the article he gives away the statistic which has had the highest correlation with the S&P 500:
After all, there are plenty of phenomena that have absolutely nothing to do with the stock market but that are nevertheless highly correlated with it. One of my favorite examples comes from David Leinweber, founder of the Center for Innovative Financial Technology at the Lawrence Berkeley National Laboratory. Several years ago, wanting to illustrate the perils of confusing correlation and causation, he searched through all the data on a United Nations CD-ROM to find the indicator with the most statistically significant correlation with the S&P 500 Index.
His discovery: butter production in Bangladesh.
There you have it. Butter production in Bangladesh.
It turns out that two other variable also had a r=.99 correlation: cheese production in the US, and the combined sheep population of Bangladesh and the US. Leinweber himself wrote in his 1995 paper :
The example in this paper is intended as a blatant example of totally bogus application of data mining in finance…If someone showed up in your office with a model relating stock prices to interest rates, GDP, trade, housing starts and the like, it might have statistics that looked as good as this nonsense, and it might make as much sense, even though it sounded much more plausible.
David Leinweber calls this technique, “torturing the data until it screams.”
Most investors recognize the ridiculousness of these statistics despite their being highly correlated. But they fail to recognize the fallacy of acting on indicators which are much less correlated.
Photo used here under Flickr Creative Commons.