Monday, August 26, 2013

Nate Silver

I am a fan of data analysis, big data or small.  Of course, generally small data sets are likely to be misleading, especially if the data is not collected in a way to represent the population well.  In many studies, the population being studied is not well specified or even specified at all.  During my grad school years, I worked in the University of Maryland School of Education Statistics lab.  It consisted of about 30 desks, each with an up-to-date calculator on it.  They were the size of large typewriters and weighed about twice what a typewriter did.  One of the most meticulous and yet worst data collections we ever ran into in that lab was when two Georgetown medical students came in with a big scroll.  These two had "sacrificed" a lab rat from a batch, one a day for 30 days.  They hoped that we could help them find something interesting and important in the resulting data, taken in about 20 measurements on each animal.  We told them (politely, of course) to get lost.  We had had too much experience with fishing trips, in which no intelligence, model or theory governed the data collection. Just an exercise in key-pressing and going around in circles.

Nate Silver is the data analyst who correctly predicted 49 of the fifty state presidential campaigns in 2008 and all 50 in 2012.  His book "The Signal and the Noise" (2012) discusses predictions, good and many bad.  The worst ones related to the 2007-2009 economic downturn, since missing the seriousness of the situation and its likelihood caused plenty of pain and disruption, and not just in this country.  Silver is a good writer and writes for a general, intelligent audience about predictions of all sorts and how easy it is to slant the data to show what you want it to.

One of the best books on the subject is still the 1950 book "How to Lie with Statistics" by Darrell Huff.  Another good discussion of how ardent people in the midst of this effort or that can be lead astray by their hopes, passions and colleagues in that in "The Information Diet" by Clay A. Johnson, who describes working on the Howard Dean bid for the presidency in 2004.  His campaign workers very, very much wanted their guy to win and Johnson describes how they regularly churned positive and negative information about the campaign so that it looked like support of their hopes, regardless of its "objective" value.

A book that Silver and others mention is the book "Moneyball", which describes the Oakland A's application of objective data in recruiting lower cost athletes for their team in a way that made a big difference and depended less on hunches and rules of thumb.  If you are interested in this subject, two more names that can lead to impressive insights: Paul Meehl and Robyn Dawes.  Meehl had been interested for years in the subject of his book "Clinical vs. Statistical Prediction".  Among other things, he showed that statistical equations did a better job than a panel of expert judges in deciding which applicants for parole would do well and which would wind up committing more crime. Dawes has a number of papers and books that shows that human intuition is not as good as data analysis at many sorts of prediction.

Popular Posts

Follow @olderkirby