Don't Underestimate Big Data

Apr 15 2014, 9:03pm CDT | by

The techniques of big data have been criticized recently in a post by Tim Harford among others. It’s quite easy to imagine the potential here has been exaggerated, and a core point about the importance of theory in a lot of empiricism is undeniable. But I think there is some overreach in the criticisms, and in particular one of the failure examples Tim gives shows the opposite of what he claims. He points to a flu forecast by google that failed:

Four years after the original Nature paper was published, Nature News had sad tidings to convey: the latest flu outbreak had claimed an unexpected victim: Google Flu Trends. After reliably providing a swift and accurate account of flu outbreaks for several winters, the theory-free, data-rich model had lost its nose for where flu was going. Google’s model pointed to a severe outbreak but when the slow-and-steady data from the CDC arrived, they showed that Google’s estimates of the spread of flu-like illnesses were overstated by almost a factor of two.

What happened here is in fact one of the advantages of big data: the ability to reject models due to frequently updating data. Google provided a forecast and the forecast and the model underlying it was unequivocably debunked. What a novelty for the social sciences! Of course big data isn’t alone in having constantly updating forecasts that can be tested against reality, but the focus on forecasting and constantly updating data is one of the advantages it has over many other areas of the social sciences. Yes, google made a model and it was falsified. But keep in mind the famous claim John Ioniaddes that most published research findings are false. Maybe it is most, or maybe it’s only half or a quarter. But with big data if a large percent of your findings are false you find out sooner rather than later.

Indeed it is the forecast driven nature of machine learning that makes it so appealing as a branch of empiricism. Maybe this is something you can’t appreciate if you haven’t actually spent time with a big dataset with lots of variables and seen how potentially easy it is to pick the wrong model, or if you haven’t taken a look at the statistical guts of disagreeing research papers. But this focus on the right model being the one that makes the best out of sample predictions, and the ability to be constantly running out-of-sample predictions just seems so much more tied to reality than most research has to be.

Another thing that I think people miss is there is too much focus on what a really long dataset does to, for example, the probability of spurious correlations. What makes machine learning interesting isn’t just dealing with long dataset, but really wide datasets. Consider the data that Netflix has on it’s customers. This isn’t remarkable because of how many people are in the dataset, but how much you know about them. Yes it’s true what they say about spurious correlations, but if you do an out-of-sample prediction with a spurious correlation the relationship will be falsified quickly no matter what your p-value.

In 2001, the statisticisn Leo Breiman wrote a paper titled “Statistical Modeling: The Two Cultures” where he contrasted the data modeling culture and the algorithmic modeling cultures that compete among statisticians. The data modeling culture used goodness-of-fit tests and residual examination to validate models, while the algorithmic modeling culture focused on out-of-sample predictions. He estimated that 98% of all statisticians fell into the former camp, and only 2% fell into the latter. I’m not sure what the numbers are today, and surely many fall into both camps, but the percent of statisticians who are algorithmic folks and the percent of important problems they are working on has gone up sharply from 2%.

It’s true that misleading p-values are a consequence of a lot of data, but this is a data modeling culture problem, not an algorithmic modeling culture problem.

Now don’t get me wrong, I am for the most part a p-value checking, residual examining, data modeling culture economist. But machine learning and big data are going to get more important not less, and I think social scientists who don’t learn to at least think like the other culture are going to be left behind.

 
 

Don't miss ...

 

<a href="/latest_stories/all/all/30" rel="author">Forbes</a>
Forbes is among the most trusted resources for the world's business and investment leaders, providing them the uncompromising commentary, concise analysis, relevant tools and real-time reporting they need to succeed at work, profit from investing and have fun with the rewards of winning.

 

blog comments powered by Disqus

Latest stories

Hitachi evinces interest in developing Andhra ports
Hyderabad, Nov 28 (IANS) Japanese multinational conglomerate Hitachi has shown interest in developing ports in Andhra Pradesh.
 
 
Ebola test results now in 15 minutes!
London, Nov 28 (IANS) A 15-minute blood and saliva test for Ebola is to be tried in Guinea.
 
 
Japanese scientists develop hybrid silk using spider genes
Tokyo, Nov 28 (IANS/EFE) Japanese scientists have developed through genetic engineering using genes from spiders and silkworms a super resistant silk which could be used for textiles as well as in the surgical field, media reported Friday.
 
 
OPEC daily basket price closes over three dollars lower
Vienna, Nov 28 (IANS/WAM) The basket of 12 crude oils of the Organisation of Petroleum Exporting Countries (OPEC) closed at $70.80 a barrel Thursday, compared to $73.70 Wednesday, the OPEC Secretariat said.
 
 
 

Latest from the Network

Apple Black Friday Sale Released
The Apple Black Friday 2014 Sale has launched online at the Apple Store. Apple has announced a huge charity promotion to fight AIDS for Black Friday and Cyber Monday. Apple will donate a portion of the sales the company...
Read more on Apple Balla
 
A look at Black Friday video game deals
All major retailers will be getting in the video game Black Friday spirit. PS4 and XBox One will certainly be two consoles that should garner huge sales but actual games should do good as well. One thing that is for...
Read more on Black Friday Countdown
 
Comics looking for event filled Black Friday
While video games and other electronic gadgets look to be the big draw on deals during Black Friday, comic book readers and fans will be able to find their fair share of deals as well. It has been announced that two...
Read more on Black Friday Countdown
 
Anna Kendrick positive after nude photo leak
Anna Kendrick is looking at the positives following the recent hacking of her nude photos. The 'Pitch Perfect' star - whose naked pictures were posted online by hackers earlier this year alongside other stars like...
Read more on Celebrity Balla
 
Benedict Cumberbatch hates selfies
Benedict Cumberbatch hates selfies. The 'Imitation Game' actor - who recently announced his engagement to Sophie Hunter - won't be personally taking any photos of him and his bride on their wedding day and instead would...
Read more on Celebrity Balla
 
James Franco filmed threesome with Zachary Quinto
James Franco and Zachary Quinto filmed a threesome for their new film 'Michael'. The actors took part in the sexy scene alongside the former 'Desperate Housewives' star Charlie Carver for the upcoming drama, according...
Read more on Movie Balla
 
UN condemns attack on British embassy vehicle
United Nations, Nov 28 (IANS) The UN Security Council (UNSC) has condemned the attack on a British embassy vehicle in Kabul, Afghanistan. "The members of the UNSC expressed their deep sympathy and condolences to the...
Read more on Politics Balla
 
Not all mosquitoes can transmit malaria
New York, Nov 28 (IANS) A genetic study has revealed that certain species of mosquitoes have evolved to better transmit malaria than even some of their close cousins. The study may advance understanding about the...
Read more on Business Balla
 
Rio judge ends Olympic golf course impasse
Rio de Janeiro, Nov 28 (IANS) A Rio de Janeiro judge has rejected a request by public prosecutors to suspend an environmental permit for work on the 2016 Olympic golf course. Judge Eduardo Antonio Klausner said there...
Read more on Sport Balla