Don't Underestimate Big Data

Apr 15 2014, 9:03pm CDT | by

The techniques of big data have been criticized recently in a post by Tim Harford among others. It’s quite easy to imagine the potential here has been exaggerated, and a core point about the importance of theory in a lot of empiricism is undeniable. But I think there is some overreach in the criticisms, and in particular one of the failure examples Tim gives shows the opposite of what he claims. He points to a flu forecast by google that failed:

Four years after the original Nature paper was published, Nature News had sad tidings to convey: the latest flu outbreak had claimed an unexpected victim: Google Flu Trends. After reliably providing a swift and accurate account of flu outbreaks for several winters, the theory-free, data-rich model had lost its nose for where flu was going. Google’s model pointed to a severe outbreak but when the slow-and-steady data from the CDC arrived, they showed that Google’s estimates of the spread of flu-like illnesses were overstated by almost a factor of two.

What happened here is in fact one of the advantages of big data: the ability to reject models due to frequently updating data. Google provided a forecast and the forecast and the model underlying it was unequivocably debunked. What a novelty for the social sciences! Of course big data isn’t alone in having constantly updating forecasts that can be tested against reality, but the focus on forecasting and constantly updating data is one of the advantages it has over many other areas of the social sciences. Yes, google made a model and it was falsified. But keep in mind the famous claim John Ioniaddes that most published research findings are false. Maybe it is most, or maybe it’s only half or a quarter. But with big data if a large percent of your findings are false you find out sooner rather than later.

Indeed it is the forecast driven nature of machine learning that makes it so appealing as a branch of empiricism. Maybe this is something you can’t appreciate if you haven’t actually spent time with a big dataset with lots of variables and seen how potentially easy it is to pick the wrong model, or if you haven’t taken a look at the statistical guts of disagreeing research papers. But this focus on the right model being the one that makes the best out of sample predictions, and the ability to be constantly running out-of-sample predictions just seems so much more tied to reality than most research has to be.

Another thing that I think people miss is there is too much focus on what a really long dataset does to, for example, the probability of spurious correlations. What makes machine learning interesting isn’t just dealing with long dataset, but really wide datasets. Consider the data that Netflix has on it’s customers. This isn’t remarkable because of how many people are in the dataset, but how much you know about them. Yes it’s true what they say about spurious correlations, but if you do an out-of-sample prediction with a spurious correlation the relationship will be falsified quickly no matter what your p-value.

In 2001, the statisticisn Leo Breiman wrote a paper titled “Statistical Modeling: The Two Cultures” where he contrasted the data modeling culture and the algorithmic modeling cultures that compete among statisticians. The data modeling culture used goodness-of-fit tests and residual examination to validate models, while the algorithmic modeling culture focused on out-of-sample predictions. He estimated that 98% of all statisticians fell into the former camp, and only 2% fell into the latter. I’m not sure what the numbers are today, and surely many fall into both camps, but the percent of statisticians who are algorithmic folks and the percent of important problems they are working on has gone up sharply from 2%.

It’s true that misleading p-values are a consequence of a lot of data, but this is a data modeling culture problem, not an algorithmic modeling culture problem.

Now don’t get me wrong, I am for the most part a p-value checking, residual examining, data modeling culture economist. But machine learning and big data are going to get more important not less, and I think social scientists who don’t learn to at least think like the other culture are going to be left behind.

 
 

Don't miss ...

 

<a href="/latest_stories/all/all/30" rel="author">Forbes</a>
Forbes is among the most trusted resources for the world's business and investment leaders, providing them the uncompromising commentary, concise analysis, relevant tools and real-time reporting they need to succeed at work, profit from investing and have fun with the rewards of winning.

 

blog comments powered by Disqus

Latest stories

Finnish scientists identify new type of black hole
Helsinki, Oct 31 (IANS) Finnish researchers have discovered a new type of low-mass black hole, which is a bright celestial body that emits X-ray, the University of Turku has said.
 
 
Union Carbide ex-chief Warren Anderson is dead
Washington, Oct 31 (IANS) Warren Anderson, former chief executive officer of the Union Carbide Corporation, is dead, a media report said Friday. He was 92.
 
 
Himalayan Viagra fuels gold rush for local Tibetans
Washington, Oct 31 (IANS) Overwhelmed by people trying to find the prized medicinal fungus known as Himalayan Viagra, two isolated Tibetan communities have managed to implement a successful system for the sustainable harvest of the precious natural resource, suggests research.
 
 
Oceans were always there on Earth: Scientists
Washington, Oct 31 (IANS) Debunking previous theories that water came late to Earth well after the planet had formed, researchers have significantly moved back the clock for the first evidence of water on Earth and in the inner solar system.
 
 
 

Latest from the Network

Julia Stiles and Ray Liotta cast in Go With Me
Julia Stiles and Ray Liotta have been added to the cast of Anthony Hopkins' new film 'Go With Me'. The action-drama, which will begin shooting in mid-November, tells the story of a young woman who returns to her home...
Read more on Movie Balla
 
Seth Rogen to star in Steve Jobs biopic
Seth Rogen is in talks to star in Sony's Steve Jobs biopic. The 'Bad Neighbours' star is in negotiations with the studio to take on the role of Apple co-founder Steve Wozniak - who created the Apple I computer and co-...
Read more on Movie Balla
 
Jennifer Lopez has struggled to love herself
Jennifer Lopez say she's spent much of her life beating herself up for not being good enough. The 'Dance Again' hitmaker, who has six-year-old twins Max and Emme with her third ex-husband, Marc Anthony, puts a lot of...
Read more on Celebrity Balla
 
Julianne Hough's teary tribute to sister
Julianne Hough broke down in tears while praising her older sister. The 26-year-old singer/songwriter made an emotional speech in which she praised her older sister, Sharee Wise, admitting her sibling inspired her to...
Read more on Celebrity Balla
 
Tom Parker vows to look after Lindsay Lohan
Tom Parker has vowed to look after Lindsay Lohan. The former Wanted star is keeping a watchful eye over the 28-year-old actress as she performs in play 'Speed-the-Plow' in London because he wants to help the flame-...
Read more on Celebrity Balla
 
Kaley Cuoco-Sweeting fell for husband over animals
Kaley Cuoco-Sweeting knew her husband was 'The One' because he loves animals. The 'Big Bang Theory' star, who married tennis player Ryan Sweeting in December after just six months of dating, admits she fell in love with...
Read more on Celebrity Balla
 
Orson Welles last film to be released
Orson Welles' final movie will finally be released next year. The acclaimed filmmaker had worked on 'The Other Side of the Wind' - which chronicles a temperamental film director battling with the Hollywood establishment...
Read more on Movie Balla
 
Game of Thrones stars get pay rise
'Games of Thrones' cast members will be paid $300,000 an episode. Five of the shows biggest stars - Peter Dinklage, Kit Harington, Lena Headey, Emilia Clarke and Nikolaj Coster-Waldau - have landed bumper payrises after...
Read more on Celebrity Balla
 
Mike Tyson: I was abused
Mike Tyson claims he was sexually abused as a child. The 48-year-old retired boxer was ''snatched'' off the street when he was seven by an ''old man'', who he never saw again after the shocking one-time incident....
Read more on Celebrity Balla
 
Amanda Bynes released from psychiatric facility
Amanda Bynes has been released from a psychiatric treatment facility. The 'Easy A' actress was spotted walking around Sunset Strip, West Hollywood, last night (30.10.14), 19 days after she was admitted to the hospital...
Read more on Celebrity Balla