I’ve been reading and greatly enjoying Nate Silver’s book, The Signal and the Noise: Why So Many Predictions Fail—and Some Don’t. I’d recommend the book based on the introduction and first chapter alone. (And, no, that’s not because that’s all I’ve read so far. It’s because they’re that good.) If you’re the sort who skips introductions, I strongly suggest you become a new sort and read this one. It’s a wonderful essay about the dangers of too much information, and the need to make sense of it. Silver makes the point that, historically, when we’ve been faced with more information than we can handle, we tend to pick-and-choose which ‘facts’ we wish to believe. Sounds like a presidential debate, no?
Another thing to like about the book is for the argument it provides against the Wired Magazine view that Big Data means the end of scientific theory. Chapter by chapter, Silver describes the very important role that theory and modeling play in making (successful) predictions. In fact, a theme of the book is that prediction is a human endeavor, despite the attention data scientists pay to automated algorithmic procedures. “Before we can demand more of our data, we need to demand more of ourselves.” In other words, the Data Deluge requires us to find useful information, not just any old information. (Which is where we educators come in!)
The first chapter makes a strong argument that the financial crisis was, to a great extent, a failure to understand fundamentals of statistical modeling, in particular to realize that the models are not the thing they model. Models are shaped by data but run on assumptions, and when the assumptions are wrong, the predictions fail. Chillingly, Silver points out that recoveries from financial crises tend to be much, much slower than recoveries from economic crises and, in fact, some economies never recover.
Other chapters talk about baseball, weather, earthquakes, poker and more. I particularly enjoyed the weather chapter because, well, who doesn’t enjoy talking about the weather? For me, perhaps because we are in the midst of elections, it also raised questions about the role of the U.S. federal government in supporting the economy. Weather prediction plays a big role in our economic infrastructure, even though many people tend to be dismissive of our ability to predict the weather. So it was interesting to see that, in fact, the government agencies do predict weather better than the private prediction firms (such as The Weather Channel), and are much better than local news channels' predictions. In fact, as Silver explains, the marketplace rewards poor predictions (at least when it comes to predicting rain). For me, this underlines the importance of a ‘neutral’ party.
As I think about preparing students for the Deluge, I think that teaching prediction should take priority over teaching inference. Inference is important, but it is a specialized skill, and so is not needed by all. Prediction, on the other hand, is inherently important, and has been for millennia.Yes, prediction is a type of inference, but prediction and inference are not the same thing. As Silver points out, estimating a candidate’s support for president is different from predicting whether or not the candidate will win. (Which leads me to propose a new slogan: “Prediction: Inference for Tomorrow!” Or “Prediction: Inference for Procrastinators!")
Much of this may be beyond the realm of introductory statistics, since some of the predictive models are complex. But the basics are important for intro stats students. All students should understand what a statistical model is and what it is not. Equally importantly, they should understand how to evaluate a model. And I don’t mean that they should learn about r-squared (or only about r-squared.) They should learn about the philosophy of measuring model performance. In other words, intro stats students should understand why many predictions fail, but some don’t, and how to tell the difference.
So let’s talk specifics. Post your comments on how you teach your students about prediction and modeling.