Tuesday, January 1, 2013

The Signal and the Noise (a review)


Nate Silver's famously accurate political forecasts on his blog FiveThirtyEight gave me a little emotional stability going into the 2012 presidential elections. He hits almost every election right, and I liked how his predictions didn't vacillate with every new event.  And my favorite candidate had a 91% chance, the morning of.  So I read his book.  

It turns out his method is simple.  Silver, who now works for the New York Times, averages many political polls after weighting each on the basis of the polsters' past successes, the sample size, and when the poll was taken. He delivers the result in terms of probabilities, which allow for things unknown and imperfections in the model. He's smart, and startlingly accurate -- so when I heard he had written a book on probability I ordered it immediately. I ordered myself three copies by accident, such was my enthusiasm.
Bayesian Probability – the heart of the book -- is a simple statistical technique that corrects for shortcuts in our thinking. It was one of the lessons in Kahneman's extraordinary Thinking Fast and Slow. Bayes factors in the prior expectations in addition to several estimated probabilities for events occurring and not occurring. The method requires constant revision as new data come forward. I like it so much I've installed a Bayes app for my smart phone.
Silver is a serious gambler, we learn, and gamblers must be especially honest in their forecasts. They put their money where their mouth is.  Silver’s own games have been mostly poker and baseball and he devotes a full chapter to each of these. You wouldn’t know it from the cute titles (such as “How to Drown in Three Feet of Water”), but each chapter is sharply focused on a particular subject: 1. mortgage collapse, 2. politics, 3. baseball, 4. weather, 5. earthquakes, 6. GDP, 7. epidemics, 8., Bayesian probability itself, 9. chess, 10. poker, 11. stocks, 12. climate change, and 13. terrorism. Each of these, it turns out, is quite an interesting challenge as they vary in their complexity, the amount of data which are available, forces for bias, and the quality of the models used in forecasting.
I've drawn several sweeping conclusions. Good forecasting requires a lot of reliable data, a clear understanding of the baseline, and while a causal model is not essential it will help sort the signal from the noise. The forecast will benefit from multiple viewpoints (though not weighted equally), and from both quantitative and qualitative methods. It must be constantly reassessed as new data come forward. Computers are excellent for processing information, but if you put enough data into them they will detect patterns and relationships that don't really exist. Usually human judgment should figure in -- in part, and interestingly, because people have good vision.
People are irrational in predictable ways. Recent, local, and highly publicized events appear more important than they actually are. We tend to select information to support prior biases –especially when there is a lot of information available (and so this is more of a problem, in the informaton age). We easily attribute actual mistakes to bad luck. We are overconfident of our opinions and wedded to them. We can easily discount the possibility of extreme events.  And we are inclined to detect meaningful patterns where there are none. The author speculates, as would I, that there are evolutionary reasons for many of these.
There are some pretty impressive successes in forecasting, though. Chess is now a computer's game because all the usefull skills and combination of profitable moves can be programmed. Poker is fairly easily forecasted, because it’s relatively easy to calculate probabilities, and the human element is fairly small. Silver played himself for many years, ending only when on-line gaming rules drove out the poorer players -- the "fish," he calls them; one day he realized he was one.
Baseball also has a fixed rulebook and scads of data, but with a more complex human element which is more difficult to predict.  In these cases, Silver says, a wider net for information is better – interviews, profiles .. in other words, an old-fashioned talent scout.  More impressive still are the weather forecasts, now quite accurate eight days in advance. These also incorporate a great deal of data, and they benefit from the immutable laws of physics, and from elaborate models which have been refined and corrected by a steady stream of feedback data. Interestingly, commercial services offer 15 day forecasts even though anything beyond eight is less accurate than the baseline (climatological data alone, which is no forecast at all).   Apparently people sometimes prefer to feel informed, even with misinformation. The commercial forecasts, which are based on the government's models, are also jiggered.  The Weather Channel rarely predicts a 50% chance of rain, because their viewers are more comfortable with 60% or 40% -- and the chance of precipitation is routinely overstated (people enjoy unexpected sunshine, and they really hate bad surprises!).   Newscasters and political pundits are terrible forecasters -- they are often biased (towards sensationalism, or a particular party) and there is no cost to them, of being wrong.
If weather has become easy to forecast, climate is not. There is far less corrective feedback data, as climate changes so slowly, and so the models are inferior.  They're also more complicated as they must incorporate the greenhouse effect, and quite a lot more chemistry.  Add to this the strong political bias, because the causes, effects, and costs fall differently around the world. Many players are biased by self interest, others by pure contrariness, and because the science is incomplete some simply get confused and deny climate change altogether. However, there is wide agreement that C02 and other greenhouse gasses increase warming, human activities are increasing these gasses, and the world is slowly warming (~1.5dC/century), at least partly as a result of human activity. These details are debated: how fast it will happen, how temperatures and precipitation will change in specific regions, economic impacts, and how effective we might be slowing or reversing it.  But if you every hear someone claim that there is no evidence that the global climate is changing, that scientists are merely promoting a political agenda, or that we don't know enough about climate to predict anything with confidence ... that is not signal. That's noise.
There are things even more difficult to predict than climate: Earthquakes, largely because the most relevant data are deeply buried and hard to gather. And terrorism – for much the same reasons. Thankfully (I suppose) both of these seem to have a power law distribution such that when plotting frequency and magnitude on two logarithmic axes, events form a straight line. This is useful predicting even the frequency of events larger than any before.   So we know how likely, but we don't know when.  Or where.
That’s enough review – I've skipped a lot of my highlights and I highly recommend reading the book through. At times Silver gets drawn into irrelevant details such as building up a story which only ends in being summarily denied an interview.   He likes baseball a whole lot more than I do.  But mostly the book is rock-hard practical.
Other quibbles -- the word data is plural; a datum is, but the data are. And there were a few typos (though when I find one in a book of this quality it's like uncovering a nice little fossil).   It's not short – 454 pages, followed by 80 pages of notes and references – but I wanted more.  There should be chapters on the job market, on real estate, marriage, religious predictions, the efficacy of legislation,such as gun control, marijuana legalization, etc.  These are even more important than baseball, or poker.  I do think we need another book.
Mr. Silver?