Polling for Soup: Using Polls to Predict Iowa (Democratic Caucus)

Political analyst Logan Vidal returns with a contribution from his Polling for Soup blog ahead of the 2016 Iowa Caucus.

Earlier this week I wrote a post modeling the 2016 general election using historical data to predict Democratic vote-share. I then inputted guesses for projected GDP growth and President Obama’s net approval rating to see how those affected vote share. Today, I’ve taken the last 7 months of Iowa polls and run simulations based entirely on the polling data, ignoring endorsements, campaign spending, or ‘committed’ super delegate tallies.

The idea behind these simulations is to get a sense of just how closely polls (which measure support at a specific moment in time) translate into actual voter support, which is a different animal caucusing than it is in a ballot box.

A priori, I’d guess that there ought to be a weaker relationship between the Iowa polls and the results in Iowa, because it is a caucus, rather than the New Hampshire polls and its primary (2008 on the Democratic side being a glaring exception). These simulations are also a useful way of modeling the undecideds in thousands of trials. These simulations estimate a probability of victory, not vote-share, thus they’ll serve as a nice test of the predictiveness of polls in Iowa.

How do we do this?

There is a lot of publicly available R code for running simulations and I found a nice electoral vote projector from R-bloggers using exit polling data that I modified for this project. The first thing that I have to do is figure out a weighting scheme. Fivethirtyeight produces a series of pollster rankings based on polls conducted since 2006. They measure these polls for error, bias, predictiveness, and keep track of whether or not they call cell phones. As great as these rankings are, there are no guidelines for how to weight the polls be source, and they provide not replication code for any of their models (to my knowledge, if they have replication code please let me know), which would be a great help to researchers, and I think it would get people to care more about the projects they’re working on.

Luckily with weights they just have to be relative to one another, their magnitude matters more than precise measurement. I’m not going to weight polls by source because I have absolutely no theoretical basis for doing so, but I will weight polls by proximity to the election. Polls in January carry twice the weight of polls in July through October and polls in November and December carry 1.5 times the weight of the early polls. I make July 1st my starting point, because it nicely carries the second half of last year, but I’ll admit it is an arbitrary cutpoint.

The simulation is relatively simple. R takes the differential of Hillary and Bernie’s support on each poll (58) and replicates the data (multiplied by each poll’s respective weighting) to generate a range of expected win probabilities for our reference candidate, in this case Hillary Clinton. I then take this distribution and replicate it 500000 times to create a plot-able distribution and generate a 95% confidence interval. The results of this simulation are shown in the two figures below. Special thanks to Flowing Data for making these figures pretty.

The red line in both figures plots the median win probability for both candidates, which for Hillary Clinton equals 53.48%. So is this right, and if it is, what does that even mean? These simulations serve as an expectation of what should happen in Iowa, were the polls both accurate and predictive and if my weighting procedure makes sense (which would be easily improved if 538 shared their code). This is a cool way of showing what the polling data is telling us, which by no means is the only data to consider when predicting a winner in a caucus state no less.

How does this compare to the prediction markets? Well Predict It, a curiously legal prediction site that took over after InTrade was shut down, is giving Hillary Clinton a 67% chance of winning to Bernie’s 38% at the time of writing. I don’t know how their percentages are greater than 100% other than some person bought a lot of stock that moved the markets before they could correct themselves. PredictWise which aggregates prediction markets gives Hillary Clinton a 72% chance of winning Iowa.

An important caveat to consider here is that these prediction markets are taking into account different variables like Hillary Clinton’s massive endorsement lead, some unknown measure of her ground operation, and the effervescent “conventional wisdom” supporting her victory.

We’ll find out tonight just how accurate these polls are, but if you like polling data more than some of these other measures of strength, I’d buy some Bernie stock which is selling on the cheap. The betting market are usually right, but they also said Joe Biden had a 75% chance of running for the Democratic nomination the day before he ruled it out.

- Logan Vidal