Do yourself a favor: read The New Yorker’s James Surowiecki on MIT’s Billion Prices Project: an effort to track inflation in real time, using prices posted on internet shopping sites.  For timely numbers, the BPP sure beats the old way of gathering inflation data: sending out mystery shoppers to write down the prices at stores, and forwarding the data back to the home office to be sorted, collated, and given the official seal of approval.  With the internet, you cut out the middle-men and -women, and save a bunch of time in the process.

Flu Correlations

Surowiecki’s piece got me thinking about bigger possibilities for “unintentional crowd-sourcing.”  Can we use people’s internet behaviors to cull other useful information about the world?  Coincidentally, on the very same day I started thinking about this, the Google Correlate project launched.  But before you click that last link, be warned:  you could lose a whole bunch of your day.

The Correlate tool takes virtually any data series—a set of numbers you upload, the trends for particular internet searches, and even a random line you draw—and finds out which other Google search trends correspond with it.  The proof of concept for the system was tracking the flu:  Google found that searches for “flu symptoms” or “flu treatments” matched very closely with CDC’s data on real-world flu trends.  So now, CDC can use internet search trends to help track flu outbreaks as they develop.  Nifty!

  • I couldn’t stop playing with Correlate—not so much for the obvious connections it finds, but for the comically ridiculous ones.  I uploaded inflation-adjusted movie box office receipts by week, and found a close correlation with internet searches for “cinema show times.”  Fair enough: if you want to know if it’s going to be a big weekend at the box office, check whether people are searching the internet for a good time to see a movie!  I also uploaded data on obesity by state, and found a spatial correlation with searches for “signs of high blood pressure” and “low sodium diet“—which makes sense, I suppose.  If you’ve got obesity-related health issues, you might have to worry about a heart-friendly diet.

    But obesity rates were an even closer fit with searches for “top rap songs,” and were surprisingly closely correlated with a particular song:  Dorrough’s “Ice Cream Paint Job.”  Huh?  What’s being heavy have to do with hip-hop?  Probably nothing: rapper Dorrough just happens to be hot in the Dakotas and the upper Midwest as well as the South.

    Which brings me to the obvious: Correlate does not prove causation.  Some things vary in tandem for no good reason at all.  And if you have enough data series, you can almost always find a close match for any trend.  So while some correlations are important, you have to be careful.  In my mind, you just can’t read too much into the fact that searches for “weather” are such a close match for “compound miter saws.”