360° Coverage : Don't Underestimate Big Data

2 Updates

Don't Underestimate Big Data

Apr 15 2014, 9:03pm CDT | by

The techniques of big data have been criticized recently in a post by Tim Harford among others. It’s quite easy to imagine the potential here has been exaggerated, and a core point about the...

Filed under: news

 
 
 

19 weeks ago

Don't Underestimate Big Data

Apr 15 2014, 9:03pm CDT | by

The techniques of big data have been criticized recently in a post by Tim Harford among others. It’s quite easy to imagine the potential here has been exaggerated, and a core point about the importance of theory in a lot of empiricism is undeniable. But I think there is some overreach in the criticisms, and in particular one of the failure examples Tim gives shows the opposite of what he claims. He points to a flu forecast by google that failed:

Four years after the original Nature paper was published, Nature News had sad tidings to convey: the latest flu outbreak had claimed an unexpected victim: Google Flu Trends. After reliably providing a swift and accurate account of flu outbreaks for several winters, the theory-free, data-rich model had lost its nose for where flu was going. Google’s model pointed to a severe outbreak but when the slow-and-steady data from the CDC arrived, they showed that Google’s estimates of the spread of flu-like illnesses were overstated by almost a factor of two.

What happened here is in fact one of the advantages of big data: the ability to reject models due to frequently updating data. Google provided a forecast and the forecast and the model underlying it was unequivocably debunked. What a novelty for the social sciences! Of course big data isn’t alone in having constantly updating forecasts that can be tested against reality, but the focus on forecasting and constantly updating data is one of the advantages it has over many other areas of the social sciences. Yes, google made a model and it was falsified. But keep in mind the famous claim John Ioniaddes that most published research findings are false. Maybe it is most, or maybe it’s only half or a quarter. But with big data if a large percent of your findings are false you find out sooner rather than later.

Indeed it is the forecast driven nature of machine learning that makes it so appealing as a branch of empiricism. Maybe this is something you can’t appreciate if you haven’t actually spent time with a big dataset with lots of variables and seen how potentially easy it is to pick the wrong model, or if you haven’t taken a look at the statistical guts of disagreeing research papers. But this focus on the right model being the one that makes the best out of sample predictions, and the ability to be constantly running out-of-sample predictions just seems so much more tied to reality than most research has to be.

Another thing that I think people miss is there is too much focus on what a really long dataset does to, for example, the probability of spurious correlations. What makes machine learning interesting isn’t just dealing with long dataset, but really wide datasets. Consider the data that Netflix has on it’s customers. This isn’t remarkable because of how many people are in the dataset, but how much you know about them. Yes it’s true what they say about spurious correlations, but if you do an out-of-sample prediction with a spurious correlation the relationship will be falsified quickly no matter what your p-value.

In 2001, the statisticisn Leo Breiman wrote a paper titled “Statistical Modeling: The Two Cultures” where he contrasted the data modeling culture and the algorithmic modeling cultures that compete among statisticians. The data modeling culture used goodness-of-fit tests and residual examination to validate models, while the algorithmic modeling culture focused on out-of-sample predictions. He estimated that 98% of all statisticians fell into the former camp, and only 2% fell into the latter. I’m not sure what the numbers are today, and surely many fall into both camps, but the percent of statisticians who are algorithmic folks and the percent of important problems they are working on has gone up sharply from 2%.

It’s true that misleading p-values are a consequence of a lot of data, but this is a data modeling culture problem, not an algorithmic modeling culture problem.

Now don’t get me wrong, I am for the most part a p-value checking, residual examining, data modeling culture economist. But machine learning and big data are going to get more important not less, and I think social scientists who don’t learn to at least think like the other culture are going to be left behind.

 
Update
2

18 hours ago

Khazanah throws MAS RM6b lifeline

Aug 29 2014 5:01pm CDT | Source: Business Times Singapore

August 30, 2014 1:15 AMKHAZANAH Nasional will inject RM6 billion (SS$2.4 billion) over three years to resuscitate loss-making Malaysia Airlines (MAS) under a recovery plan that includes even an Act of Parliament. Other key moves are migrating its operations, assets and liabilities to a new company (NewCo) and slashing the workforce of 20,000 by 30 per ce ...
Source: Business Times Singapore   Full article at: Business Times Singapore
 

 
Update
1

1 day ago

MAS posts loss of RM307m for Q2

Aug 28 2014 5:00pm CDT | Source: Business Times Singapore

August 29, 2014 1:13 AMMALAYSIA Airlines (MAS) registered a loss of RM307 million (S$122 million) for the second quarter to end-June, but warned of worse to come in the second half when the "full financial impact of the double t ...
Source: Business Times Singapore   Full article at: Business Times Singapore
 

 

Don't miss ...

 

<a href="/latest_stories/all/all/30" rel="author">Forbes</a>
Forbes is among the most trusted resources for the world's business and investment leaders, providing them the uncompromising commentary, concise analysis, relevant tools and real-time reporting they need to succeed at work, profit from investing and have fun with the rewards of winning.

 

blog comments powered by Disqus

Latest stories

Modi in Japan: Kyoto-Varanasi partnership pact inked
Kyoto, Aug 30 (IANS) Indian Prime Minister Narendra Modi began a five-day visit to Japan Saturday, a day that saw the inking of a partnership agreement between Varanasi, one of the oldest continuously inhabited cities in the world, and Kyoto, which was the capital of Japan for over 1,000 years.
 
 
Essar sells Kenya telecom business for $120 million
Nairobi, Aug 30 (IANS) Essar Capital Saturday announced the signing of binding agreements with Safaricom and Airtel for the sale of its telecom business in Kenya for $120 million.
 
 
India, Japan ink Kyoto-Varanasi partnership agreement
Kyoto, Aug 30 (IANS) India and Japan Saturday signed the Kyoto-Varanasi partnership agreement, soon after Prime Minister Narendra Modi was warmly received by his Japanese counterpart Shinzo Abe here.
 
 
IMF board supports Lagarde amid corruption probe
Washington, Aug Aug 30 (IANS) The International Monetary Fund's (IMF) executive board Friday expressed support for Managing Director Christine Lagarde in carrying out her duties in the light of her being put under investigation for showing negligence in a scandal.
 
 
 

Latest from the Network

Imran Khan again demands Sharif's resignation
Islamabad, Aug 30 (IANS) Pakistan Tehreek-e-Insaf (PTI) chairman Imran Khan Saturday reiterated his demand that Prime Minister Nawaz Sharif step down until the judicial commission completes its probe into the alleged...
Read more on Politics Balla
 
Jessica Alba's Sin City shock
Jessica Alba was shocked when she got asked to star in the 'Sin City' sequel. The brunette beauty has reprised her role as stripper Nancy Callahan in new movie 'Sin City: A Dame to Kill For' - a follow up to the 2005...
Read more on Movie Balla
 
Japan, India may launch security consultative framework
Kyoto (Japan), Aug 30 (IANS) Japan and India are likely to agree to launch a security consultative framework involving the two countries' foreign and defence ministers, a senior Japanese foreign ministry official has...
Read more on Politics Balla
 
Kate Mara splits from Max Minghella
Los Angeles, Aug 30 (IANS) The "House of Cards" actress Kate Mara and "The Mindy Project" actor Max Minghella, who were having an on and off relationship for the past four years, have reportedly finally decided to call...
Read more on Celebrity Balla
 
Protests hampering Pakistan's progress: Nawaz Sharif
Lahore, Aug 30 (IANS) Prime Minister Nawaz Sharif Saturday said the sit-in and protests hindered the progress of the country and he accepted the legitimate demands of the protesters, Geo News reported. In an interview...
Read more on Politics Balla
 
Modi in Japan: Kyoto-Varanasi partnership pact inked
Kyoto, Aug 30 (IANS) Indian Prime Minister Narendra Modi began a five-day visit to Japan Saturday, a day that saw the inking of a partnership agreement between Varanasi, one of the oldest continuously inhabited cities...
Read more on Politics Balla
 
Essar sells Kenya telecom business for $120 million
Nairobi, Aug 30 (IANS) Essar Capital Saturday announced the signing of binding agreements with Safaricom and Airtel for the sale of its telecom business in Kenya for $120 million. The firm, through its portfolio...
Read more on Business Balla
 
Joan Rivers' condition remains serious
New York, Aug 30 (IANS) Comedienne Joan Rivers, who was reportedly placed in a medically-induced coma, remains in a "serious" condition at a hospital here, says her daughter Melissa Rivers. The "Fashion Police" host...
Read more on Celebrity Balla
 
India restrict England to 227 runs
Nottingham, Aug 30 (IANS) Despite suffering a mid-innings collapse, England managed to reach 227 runs in 50 overs before being bowled out by India in the third One-Day International (ODI) at Trent Bridge here Saturday...
Read more on Sport Balla
 
India, Japan ink Kyoto-Varanasi partnership agreement
Kyoto, Aug 30 (IANS) India and Japan Saturday signed the Kyoto-Varanasi partnership agreement, soon after Prime Minister Narendra Modi was warmly received by his Japanese counterpart Shinzo Abe here. Abe and Modi...
Read more on Politics Balla