Home | Tech | Big data, bad analytics

Big data, bad analytics

By
Font size: Decrease font Enlarge font
Big data, bad analytics

Big data is not about the data, it's about the analytics, according to Harvard University professor Gary King -- and, boy, are there some really bad analytics out there. One of his favorite recent examples concerns a big data project that set out to use Twitter feeds and other social media to predict the U.S. unemployment rate.

The researchers devised a category of many words that pertained to unemployment, including: jobs, unemployment and classifieds. They culled tweets and other social media that contained these words then looked for correlations between the total number of words per month in this category and the monthly unemployment rate. This is known as sentiment analysis by word count, and it's a common analytics approach, King said.

Money was raised. The work crept along, when all of a sudden there was a tremendous spike in the number of tweets containing the type of words that fell into this category. Maybe the researchers were really onto something. More money poured into the project. "What they hadn't noticed was Steve Jobs died," said King, the Albert J Weatherhead III University Professor and director of the Institute for Quantitative Social Science at Harvard. Of course, tweets with "Jobs" spiked for a completely different reason.

King, whose research focuses on developing and applying empirical methods to social science research, said such errors happen "all the time" in sentiment analysis by word count and other "off the shelf" analytics programs. That's because these approaches tend to conflate humans with systems that respond in completely predictable ways. That's bad analytics. "We're pretty good at being humans but pretty lame at being computers."

Read more...

Join PRESIDENT&CEO on LinkedIn

Subscribe to comments feed Comments (0 posted)

total: | displaying:

Post your comment

  • Bold
  • Italic
  • Underline
  • Quote

Please enter the code you see in the image:

Captcha