Don't Underestimate Big Data

Apr 15 2014, 9:03pm CDT | by

The techniques of big data have been criticized recently in a post by Tim Harford among others. It’s quite easy to imagine the potential here has been exaggerated, and a core point about the importance of theory in a lot of empiricism is undeniable. But I think there is some overreach in the criticisms, and in particular one of the failure examples Tim gives shows the opposite of what he claims. He points to a flu forecast by google that failed:

Four years after the original Nature paper was published, Nature News had sad tidings to convey: the latest flu outbreak had claimed an unexpected victim: Google Flu Trends. After reliably providing a swift and accurate account of flu outbreaks for several winters, the theory-free, data-rich model had lost its nose for where flu was going. Google’s model pointed to a severe outbreak but when the slow-and-steady data from the CDC arrived, they showed that Google’s estimates of the spread of flu-like illnesses were overstated by almost a factor of two.

What happened here is in fact one of the advantages of big data: the ability to reject models due to frequently updating data. Google provided a forecast and the forecast and the model underlying it was unequivocably debunked. What a novelty for the social sciences! Of course big data isn’t alone in having constantly updating forecasts that can be tested against reality, but the focus on forecasting and constantly updating data is one of the advantages it has over many other areas of the social sciences. Yes, google made a model and it was falsified. But keep in mind the famous claim John Ioniaddes that most published research findings are false. Maybe it is most, or maybe it’s only half or a quarter. But with big data if a large percent of your findings are false you find out sooner rather than later.

Indeed it is the forecast driven nature of machine learning that makes it so appealing as a branch of empiricism. Maybe this is something you can’t appreciate if you haven’t actually spent time with a big dataset with lots of variables and seen how potentially easy it is to pick the wrong model, or if you haven’t taken a look at the statistical guts of disagreeing research papers. But this focus on the right model being the one that makes the best out of sample predictions, and the ability to be constantly running out-of-sample predictions just seems so much more tied to reality than most research has to be.

Another thing that I think people miss is there is too much focus on what a really long dataset does to, for example, the probability of spurious correlations. What makes machine learning interesting isn’t just dealing with long dataset, but really wide datasets. Consider the data that Netflix has on it’s customers. This isn’t remarkable because of how many people are in the dataset, but how much you know about them. Yes it’s true what they say about spurious correlations, but if you do an out-of-sample prediction with a spurious correlation the relationship will be falsified quickly no matter what your p-value.

In 2001, the statisticisn Leo Breiman wrote a paper titled “Statistical Modeling: The Two Cultures” where he contrasted the data modeling culture and the algorithmic modeling cultures that compete among statisticians. The data modeling culture used goodness-of-fit tests and residual examination to validate models, while the algorithmic modeling culture focused on out-of-sample predictions. He estimated that 98% of all statisticians fell into the former camp, and only 2% fell into the latter. I’m not sure what the numbers are today, and surely many fall into both camps, but the percent of statisticians who are algorithmic folks and the percent of important problems they are working on has gone up sharply from 2%.

It’s true that misleading p-values are a consequence of a lot of data, but this is a data modeling culture problem, not an algorithmic modeling culture problem.

Now don’t get me wrong, I am for the most part a p-value checking, residual examining, data modeling culture economist. But machine learning and big data are going to get more important not less, and I think social scientists who don’t learn to at least think like the other culture are going to be left behind.

 
 

Don't miss ...

 

<a href="/latest_stories/all/all/30" rel="author">Forbes</a>
Forbes is among the most trusted resources for the world's business and investment leaders, providing them the uncompromising commentary, concise analysis, relevant tools and real-time reporting they need to succeed at work, profit from investing and have fun with the rewards of winning.

 

blog comments powered by Disqus

Latest stories

First Italian female astronaut ready for spaceflight
Rome, Nov 24 (IANS) History will be made Monday with the first ever Italian female astronaut set to go into space for a mission on board the International Space Station (ISS).
 
 
Mummy wearing jewellery unearthed in Egypt
London, Nov 23 (IANS) Spanish archaeologists have discovered about 4,000 years old female mummy wearing rare jewellery in Egypt.
 
 
Exercise and fasting could boost brain's functions
Washington, Nov 23 (IANS) Exercise along with occasional fasting is good for boosting the brain's neurons, shows a new research.
 
 
One infant dies in Pakistan hospital
Islamabad, Nov 23 (IANS) One more infant died due to lack of oxygen in an incubator and negligence of the hospital administration in Pakistan, bringing the number of such deaths to 19 in the past five days.
 
 
 

Latest from the Network

It?s direction over acting for Jolie
Los Angeles, Nov 24 (IANS) Actress Angelina Jolie says she is now looking to focus on her directing career. "I've never been comfortable as an actor -- I've never loved being in front of the camera. I didn't ever...
Read more on Celebrity Balla
 
Al Qaeda-linked fighters target two Shia towns in Syria
Damascus, Nov 24 (IANS) The Al Qaeda-linked Nusra Front and like-minded groups attacked two pro-government and predominantly Shia towns, in Syria's Aleppo province. The attack on the towns of Nubul and Zahra Sunday...
Read more on Politics Balla
 
Pattinson gets naughty with FKA Twigs in public
Los Angeles, Nov 24 (IANS) Actor Robert Pattinson seemed to be in a playful mood when he grabbed his girlfriend FKA Twigs's derriere while strolling here recently. The duo had stepped out together Friday afternoon for...
Read more on Celebrity Balla
 
Focus shifts to 'political agreement' over Iranian nuclear issue
Tehran, Nov 24 (IANS) Iran and six world powers would now look to negotiate a "political agreement" as a comprehensive deal by the Nov 24 deadline would be "impossible", Iranian officials said. "We haven't been...
Read more on Politics Balla
 
Political leader shot dead in Pakistan
Karachi, Nov 24 (IANS) A leader of the Awami National Party (ANP), Ziauddin was gunned down Sunday by unknown assailants in Orangi Town in the Pakistani city of Karachi, media reported. According to police, the leader...
Read more on Politics Balla
 
Israeli cabinet pushes to make country 'Jewish state' by law
Jerusalem, Nov 24 (IANS) The Israeli cabinet approved a bill Sunday, which enshrines in law that Israel is the national homeland of the Jewish people and that this right is unique to them. Fifteen ministers voted in...
Read more on Politics Balla
 
When Lily Allen's sex life 'lacked magic'
Los Angeles, Nov 24 (IANS) Singer Lily Allen, who says she didn't enjoy a great sex life until she was in her mid-20s, believes sex needs to be reprented as an "emotive experience" in popular media. She says the media...
Read more on Celebrity Balla
 
Bangladesh beats Zimbabwe in second ODI
Dhaka, Nov 24 (IANS) Bangladesh beat Zimbabwe by 68 runs in the second one day international (ODI) at the Zahur Ahmed Chowdhury stadium in Chittagong Sunday, to go 2-0 up in the five-match cricket series. Batting...
Read more on Sport Balla
 
Stand up for your rights, Imran Khan tells Pakistanis
Islamabad, Nov 24 (IANS) Pakistan Tehreek-i-Insaf (PTI) chief Imran Khan said Sunday that the people of Pakistan must stand up for their rights, when the party holds a rally Nov 30 in Islamabad. Addressing a public...
Read more on Politics Balla
 
Ryan Reynolds' 'scary' father
Ryan Reynolds thought his father was ''scary''. 'The Proposal' star - who is expecting his first child with his wife, Blake Lively - has admitted that his dad worked incredibly ''tirelessly'' to provide for their family...
Read more on Celebrity Balla