I wrote a slightly different version of this post on Facebook the day before the election, in order to argue why it was irrational to buy the claims made by some pundits, such as Sam Wang from the Princeton Election Consortium, that Clinton had a 99% chance of winning. I decided to post a revised version here, because even though the election is over, I think it still explains a lot of things about the way in which polls and models that purport to predict the result of elections work. The issue requires that we about statistics, but don’t worry if you don’t like math, I will try to explain things in layman’s terms and I hope you will learn things people often don’t understand. For instance, you will know what a poll’s margin of error actually is, and understand why 99% of sentences about polls that contain the expression “margin of error” are false.
I will only talk about election models that rely heavily on polls, because I think only that kind of models is reliable. There is another kind of election models, which rely on what people call the fundamentals (such as the unemployment rate, the rate of economic growth, etc.), but I don’t think they are reliable and plan to write a short post to explain why. (Of course, you can design a model that rely both on polls and fundamentals, but I will ignore that possibility to keep the discussion focused and simple.) To be more specific, I will describe Wang’s basic model, because he makes everything public so anyone can look under the hood and see how it works, but also because if I created a model of my own I would basically do the same thing. This model calculates a probability distribution for each possible outcome, in terms of number of electoral votes, in the electoral college. As we shall see, however, one needs to make a number of non-trivial assumptions to compute that probability distribution and that’s where Wang and a lot of other people made important mistakes.
The procedure used by Wang is really simple and, in my opinion, is basically what anyone sensible would do. The first step is to look at the poll average in every state and use that to calculate a probability that Clinton is going to win in that state. Actually, since the probabilities that Clinton is going to win individual states are correlated, you can’t really compute the probabilities in question for each state separately, but I will come back to this shortly so let’s ignore that for the moment. (Wang also uses the poll median instead of the poll average to get around the problem of outliers, which may or may not be a good idea, but I’m going to ignore that.) Once you have a probability that Clinton is going to win for each state, for every possible combination of outcomes state-by-state, you can use that to compute a probability that she is going to win that exact combination of outcomes. Then, in order to get a probability that Clinton is going to win the election, you just have to sum the probabilities that she is going to win each possible combination of outcomes state-by-state that would result in at least 270 votes for her in the electoral college. This, in a nutshell, is what Wang’s model does. (It doesn’t exactly do that, because there are possible combination of outcomes state-by-state, which is so enormous that it would take years to perform the computation. But what it does is equivalent, except that it’s much faster.) If the probability distribution calculated by that model were accurate, it would be straightforward to infer the probability the probability that Clinton would win, but as we shall see things are more complicated.
Now, if you did that and used that probability distribution to predict the outcome of the election, given what the polls in each state said just before the election, you would have predicted that Clinton is overwhelmingly likely to win and indeed that’s what most people said at the time. However, as I will explain shortly, you can’t directly infer that from the probability distribution in question, because as I already noted a lot of non-trivial assumptions went into the calculation. In particular, in order to compute that probability distribution, one has to make a lot of assumptions about non-sample polling error and it’s at this stage that Wang made several mistakes which proved catastrophic. That is why, in the weeks before the election, he kept going around saying that “it [was] totally over” and that he would “eat a bug” if Trump got more than 240 electoral votes. Well, we know how that turned out, don’t we…
If the probabilities that Wang’s model computes for each state were right, you could have used the resulting probability distribution of the outcomes in the electoral college to straightforwardly derive the probability that Clinton was going to win, which is just the probability that she gets at least 270 votes in the electoral college. But you need to assume that the probabilities for each individual state are correct and, as it turned out, they were not. In order to understand why, we need to look at how polls were used to calculate those probabilities. The problem is that, while it’s relatively straightforward to account for one type of polling error when calculating those probabilities, it’s a lot trickier to account for another type of polling error. The type of polling error that is relatively straightforward to account for is sampling error, so let me explain quickly what that is first. This will clear up a lot of misunderstandings that most people have about the “margin of error” that journalists talk about when they report the results of polls.
Suppose that you have a big urn full of balls, some of which are blue and the others red. Say it contains several dozens of thousands of balls and it wouldn’t be practical to look at all of them. You want to know what proportion of the balls in that gigantic urn are blue and what proportion of them are red. So what you do is that you randomly pick a sample of balls in the urn and look at the balls in that sample to see how many are blue and how many are red. It’s very important that you truly randomly pick them, which means that every ball in the urn must have the same probability of being picked. (Actually, this assumption can be relaxed, but let’s not worry about that.) That’s basically what a poll is, except that instead of looking at balls, pollsters are asking people who they’re going to vote for.
So you randomly pick 100 balls in the urn and, when you look at them, you see that 55 of them are blue and 45 are red. Now, you can’t infer that 55% of the balls in the urn are blue and that 45% are red, because even if you picked the balls randomly, the proportions in the sample are probably somewhat different from the proportions in the urn just because of chance. But you can use probability theory to calculate a confidence interval, typically a 95% confidence interval, which is what gives rise to the margin of error everyone is talking about all the time.
A confidence interval for the proportion of balls in the urn that are blue is an interval around the proportion in the sample such that, if you do a “poll” of the urn an arbitrarily large number of times and use the same method to calculate the interval every time, the actual proportion of balls in the urn that are blue will be in that interval 95% of the times. For instance, suppose that in the sample I was talking about earlier (where 55% of the balls are blue and 45% are red), the margin of error is +/- 5%. It does not mean that the actual proportion of balls that are blue in the urn has a 95% probability of being in the interval [50,60]. What it means is that, if you do a “poll” of the urn by picking 100 balls an arbitrarily large number of times and calculate a confidence interval every time, the actual proportion of balls that are blue in the urn will be in the interval you calculate 95% of the time. But the interval is going to be different, possibly by quite a lot, every time you do a “poll” of the urn.
So, if you do a poll with a margin of error of 3% in Michigan and 52% of the respondents tell you that they’re going to vote for Clinton and 48% tell you that they’re going to vote for Trump, it doesn’t mean that Clinton has a 95% probability of winning that state. It just means that, if you did a poll using the same methodology an arbitrarily large number of times, the actual proportion of people who are going to vote for Clinton in Michigan would be in the confidence interval you calculate 95% of the times, but that interval would be different every time. Moreover, it also doesn’t mean, as journalists say all the time, that Clinton’s margin in that poll is “outside the margin of error”. It is not. According to that poll, the lower bound of the confidence interval for the proportion of people who are going to be voting for Clinton in Michigan is 49%, while the upper bound of the confidence interval for the proportion of people who are going to vote for Trump is 51%. Of course, even if the sample is truly random, the actual proportion of people who are going to vote for either of them could be much higher or much lower. This is just one poll and, even if the sample was truly random, chance could have messed things up. In fact, it almost certainly did, at least to some extent.
Okay, now that I have explained what sampling error is, let’s go back to how one could compute a probability that Clinton is going to win in each state if sampling error were the only type of polling error. Suppose that, in every state, the various methodologies used by different pollsters result in the same estimator of the margin between Trump and Clinton with the same distribution and no bias. To simplify a bit, that’s what would happen if every pollster used the same methodology and, when you use that methodology, the sample is truly random and there is no measurement error, i. e. every person who is going to vote in the state has the same probability of ending up in the sample and, if they tell you that they’re going to vote for X, they really are going to vote for X. (This definition of measurement error is somewhat unusual, for it implies that if a respondent honestly replied that he was going to vote for Clinton but changed his mind between the time he was polled and the election, this counts as measurement error. It’s probably not how most people would define measurement error, but it makes the discussion simpler, so I don’t think it’s a problem.) If that is true, you can use a theorem called the “central limit theorem” (look it up if you’ve never heard of it, it’s a truly mind-blowing result) and a theorem proven by a guy called William Gosset (but who for some reason published the proof under the pseudonym “Student”) to prove that, to a good approximation (provided there are enough polls), the poll average obeys a probability distribution called “Student’s t-distribution”, after the dude in question. Finally, using that distribution, you can compute the probability that Clinton is going to win based on whatever the poll average happens to be.
So far, so good, but this is where things start to get more complicated. The problem is that the assumptions on which the calculation I just explained rests are completely unrealistic. Random sampling error is not the only kind of polling error. We know that the methodologies used by pollsters do not result in an unbiased estimator of the margin between Clinton and Trump, because their samples are not truly random and because there is measurement error. Polling is a really tricky business and pollsters have to make all sorts of assumptions to get a result from their data. In particular, they have to make assumptions about what kind of people are going to turn out to vote to construct a sample or weight it appropriately, which also means that they have to make assumptions about what characteristics are relevant to predict turnout. That’s just one reason why, if you give the same data to different pollsters, they will probably all end up with a different result. (See this article in the New York Times about this.) Given how many non-trivial assumptions pollsters have to make, a lot of things can go wrong. Indeed, if you look at the history of polling, things go wrong very often. For instance, in 2012, the polls were off by 3 points (in favor of Obama), which is more than Clinton’s lead according to the national RCP average on the eve of the election and was actually further off the mark than polls in 2016.
First, even if there were no measurement error, since the samples used by pollsters are not truly random, the probability you compute by using the central limit theorem and Gosset’s theorem about the Student’s t-distribution would still not be accurate. There is no reason to think that the samples used by various pollsters are biased in such a way that, when you look at the average, the biases cancel out. It may very well be that, on the contrary, they reinforce each other. In particular, that’s exactly what you should expect if pollsters make similar mistakes when they make assumptions about who is going to show up to vote, which is likely. For instance, if most of them overestimated the rate at which black people were going to turn out to vote, it would have biased the polls in favor of Clinton. Since Obama was not running, it was clear that black turnout was going to decrease relative to 2012 (even though many people were under the illusion that blacks would turn out en masse to stop Trump), the only question was by how much. Since black people overwhelmingly vote Democrat, a small mistake in the assumptions pollsters make about black turnout could skew their results in a significant way. As it turned out, pollsters probably overestimated black turnout in constructing their samples, at least in some key states. Furthermore, since the mistakes that pollsters make when they construct their samples are probably similar in different states, the polling errors that result from those mistaken assumptions in various states are presumably correlated to some extent, which should be taken into account when calculating the probabilities that a candidate is going to win in each individual state. But there is no obvious way to figure out exactly what assumptions we should make about how exactly they are correlated.
Moreover, we know that there is measurement error, which makes the whole enterprise of predicting who is going to win the election by using polls even trickier. For instance, even before the election, many people had raised the possibility of a so-called reverse Bradley effect. The idea is that some respondents interviewed by pollsters who said they were not going to vote for Trump actually were going to vote for him but were ashamed to admit it. I read some very bad arguments against the existence of a reverse Bradley effect before the election, but this post is already long and I don’t have time to explain why these arguments were terrible, so I’ll just leave it at that. In a way, the existence of a reverse Bradley effect was trivial, the important question was how big it was going to be. In fact, even after the election, it’s still very difficult to answer that question. Furthermore, insofar as sources of measurement error are similar in different states, it means that non-sampling polling errors in various states are probably correlated. For instance, if people who intend to vote for Trump are loath to admit it to pollsters in Ohio, it’s probably also true of people who intend to vote for Trump in Pennsylvania. Similarly, if there is some bad news for Clinton right before the election, it will presumably make the polls taken before that event unreliable in every state, as people all over the country suddenly change their mind. Again, it’s very difficult to know how exactly they are correlated, yet it’s absolutely essential to make the right assumptions about that if we’re going to use polls to predict the outcome of the election.
What this means is that, if when you calculate the probability that Clinton is going to win in each individual state, you don’t take into account the fact that pollsters use biased samples and the possibility of measurement error, the number you get will not give you a good sense of how likely to win Clinton is, because the computation rests on assumptions that we know to be false. Even if you didn’t follow everything I explained above, you can convince yourself of that by considering the fact that, in pretty much any election, ignoring those facts would result in a prediction that one candidate or the other has a probability of winning of more than 95%. But nobody thinks that, in any election, we can have that kind of certainty, which shows that a probability computed in that way does not really tell you how likely Clinton was to win. Of course, it could have been be that our intuition is just wrong and that in fact most elections are not as uncertain as most people think, but in that case we just know that our intuition is not misleading.
I think it will help to work through a concrete example to convince you that, by using that method to calculate the probability that Clinton was going to win in each state, one would have been overestimating how certain one can be of the outcome. Suppose that, in a state, we have 5 polls that give Clinton a margin of 0.2, 0.9, 0.5, 0.3 and 0.8 (average = 0.54). In that case, according to that method, the probability that Clinton was going to win that state would have been more than 99%! (You can get a very high probability of winning for Clinton even if the poll average of her margin is very small provided that the variance is also very small.) Now, if you had seen those polls just before the election, I’m sure you would have thought that Clinton was more likely to win (as you should have), but I doubt you would have thought that the probability she was going to win was more than 99%. And, while our intuitions are often wrong, this particular intuition is not. It would only be rational to make that inference if you had good reasons to assume that the polls are not biased. However, not only do you have no reason to think that, but on the contrary you have every reason to think that they are!
It’s because when you use that kind of method, it’s so easy for the computation to result in a very high probability of winning for one of the candidate that, no matter the election, this method is probably going to predict that one of the candidate is going to win with a probability of more than 95%. But it would have been irrational to have a credence that Clinton was going to win equal to the probability calculated by that method. So, no matter what the probability that Clinton was going to win was exactly (assuming it’s even meaningful to talk of a precise probability that she was going to win), it was significantly less than 99%. How much less? Fuck if I know! Trying to correct the bias in the polls is probably a terrible idea, because you’re just going to inject your own biases in the process and we don’t have enough information about what assumptions pollsters make exactly. Indeed, while I have only mentioned examples of biases in favor of Trump above, you could also have come up with plenty of reasons to think that the polls were biased against Clinton before the election. This means that, for all we know before the election given what the polls were saying, she could also have won in a landslide. But she didn’t and, if so many pundits didn’t see that coming, it’s because they didn’t take seriously the possibility of systematic polling error and were extremely complacent.
What you should do is somehow take into account the uncertainty due to the possibility that the polls might be wrong because of systematic polling error in the model. That’s apparently what Nate Silver did, which is why his estimate of the probability that Clinton was going to win was lower than Wang’s. And he was right to do so, though I have no idea if he did that in a sensible way, because his model isn’t public. (Silver has gotten a lot of shit because, even though Clinton lost, his model said that she had a 65% chance of winning. But I think this criticism is confused, because given the evidence on the eve of the election, Silver’s prediction strikes me as perfectly reasonable. Indeed, given the evidence, I think it would have been irrational to predict that Trump was going to win. It’s just that it was also irrational to think that Clinton had a 99% chance of winning. If you throw a die which you know to be fair and predict with a very high confidence that it’s going to land on 6, you’re being irrational even if the die just happens to land on 6. Scott Alexander made a similar point on his blog right before the election.) To be clear, Wang also did that, but he clearly didn’t take the possibility of systematic error as seriously as he should have.
People had pointed out to him that possibility before the election, but he dismissed it by saying that he was doing enough to take it into account. It’s likely that he was somewhat blinded by his own political bias. It’s also likely that, had he not so confidently asserted that Trump had no chance of winning throughout the campaign, his website would have been a lot less popular. (To be clear, I’m not suggesting that Wang consciously tweaked his model to give the answer that his readers wanted to hear, but perhaps the fact that his readers wanted to hear that unconsciously made him less sensitive to criticism as he also got a lot of support from people who found his prediction comforting. I know that a lot of my friends, who were freaking out about Trump, were using PEC as a way to reassure themselves.) In particular, he vastly overestimated the correlation between sampling errors in different states, as he admitted after the election.
The bottom line is that, because so many non-trivial assumptions enter into the computation of a prediction based on polls and the probability computed by the model is extremely sensitive to what assumptions you make exactly, it’s irrational to have the kind of confidence that Clinton is going to win exhibited by people like Wang. (In fact, it arguably doesn’t make sense to talk of the probability that Clinton was going to win, because there was several reasonable ways of tweaking the basic model to take into account polling uncertainty due to polling error which probably would have yielded significantly different results, but there was no way of adjudicating between them.) The rational thing to say before the election was that Clinton was probably going to win, but you shouldn’t have bet your house on it, because there was also a real chance that she was going to lose and, as we know, it’s exactly what happened…
EDIT: Since I first published it a month ago, I made several changes to this post, because the original version had been written hastily and contained some misleading passages.
ANOTHER EDIT: If you found this post interesting, you should also check this article by Andrew Gelman and Julia Azari. There is a back and forth between them and other people in the same issue of that journal which is also interesting.