Polling and Turnout and Crosstabs… Oh Sigh.

I like polls. During the first presidential election that I and the Internet were both old enough to follow (though I don’t remember if it was 2000 or 2004), I remember being impressed at the way every single “solid” state turned out exactly as predicted. Time after time, polls generally seem to be accurate, and when you learn a bit about statistics and polling methodology, it makes sense.

Of course, there will always be specific situations where the polls are wrong, and a lot of right-wingers think that is happening today. Nate Silver gives Obama a 75%+ chance of winning, and apparently a lot of conservatives are attacking him and/or the polls for having a liberal bias. Sonic Charmer finds himself an unlikely defender of Silver’s analysis by quickly reproducing a crude version of it in a spreadsheet.

I generally agree with Sonic here, but I haven’t seen the liberal opponents in the media and blogosphere respond (though I haven’t really been looking for it) to one of the primary conservative poll complaints brought forth by folks like @NumbersMuncher, who say that the pro-Obama polls are hopelessly overestimating Democrat turnout to be equal to or even greater than 2008 levels. Here is one tweet that made the rounds yesterday:

I had to admit, that does sound rather fishy, so I decided to delve deeper into some of these polls than I had up to now. Here is the Quinnipac poll in question. The Virginia demographic split on page 5 confirms Barron’s tweet:

Obama Romney Adv.
Rep 6 93 +87
Dem 96 3 +93
Ind 36 57 +21

Yep, Republicans and Democrats mostly split, with Romney up 21 among independents, but still down 49-47 overall. That seemed like a paradox to me, so I scrolled further and found the proportional splits on page 18. For Virginia, the polled voters identified as 35% Democrat, 35% Independent, and 27% Republican. I spun those percentages into a spreadsheet and here’s what I got:

 

Sample Obama O Share Romney R Share
Rep 27% 6 1.62 93 25.11
Dem 35% 96 33.6 3 1.05
Ind 35% 36 12.6 57 19.95
47.82 46.11

Whaddaya know, a 2% lead for Obama (the exact numbers don’t quite match due to rounding and the 4% who identified as Other/NA. So the numbers really do work. Think of it this way: the Reps and Dems essentially have all of their bases, which puts Romney -8 right off the bat. Another 1/3 of the electorate is +21 Romney, which works out to an approximately +7 advantage for Romney that’s still not enough to wipe out Obama’s base advantage.

However, another recent poll has Romney up 5% in Virginia. How could they come up with a 7-point difference? They give similar numbers across the political demographics. Romney us +91 among Republicans, Obama is +92 among Democrats, and Romney is +26 among independents. The key difference is in the proportional sample of those demographics. With my raw analysis I can reproduce a similar 4% lead for Romney:

Sample Obama O Share Romney R Share
Rep 31% 4 1.24 95 29.45
Dem 35% 94 32.9 2 0.7
Ind 30% 33 9.9 59 17.7
44.04 47.85

This poll puts Republican turnout 4% higher than the other poll, and that plays the primary role in reversing Obama’s lead in that state. As some people have been saying all along, it all comes down to turnout.

Now, I agree with a lot of conservatives that in 2012 the Democrats will probably not see the turnout advantage they saw in 2008, as many Republicans are more excited and many Democrats are less excited. Maybe we’ll see something somewhere between 2008, where the Democrats scored big, and 2010, where Republicans scored big. If Democrat turnout is +8 in Virginia, Obama may win by a couple. If it’s only +4, Romney will probably win due to his crushing lead among independents in these two polls.

So if we hypothesize that turnout will be less Democratic than 2008, we are still left with the question, what was the Democratic turnout advantage in 2008 anyway? You don’t check a mark in the booth for “Democrat, Republican, or Independent.” I’m guessing that NumbersMuncher is basing his comparisons on exit polls, where random folks are queried as they leave their polling station. CNN shows 2008 exit polls for Virginia as the following:

Sample
Rep 33%
Dem 39%
Ind 27%

Here the Democrats had a 6-point advantage over the Republicans. So it’s true, the poll giving Obama a 2% lead is assuming an 8-point Democratic turnout advantage that’s higher than 2008. I admit this does seem unrealistic. Of course, the second poll’s 4-point advantage may be unrealistically low. I can’t find 2010 exit polls, and I don’t know how reliable the 2008 ones are anyway.

But where do the samples in these polls come from? I used to assume pollsters just called people and asked questions and tallied things up. Of course, this doesn’t explain why PPP always falls on the left side of the polling averages and Rasmussen always falls on the right side. Are these firms reporting similar results but somehow consistently coming up with demographics that “oversample” or “undersample” their favored party? I don’t know, because I have a life, and that life involves not getting paid to spend hours browsing poorly formatted PDFs of polling results.

However, I did look at another recent poll which gives Obama a lead in Virginia. It has Obama up even more (4%), so you might think, based on everything discussed above, that it assumed an even larger Democratic turnout advantage. But it’s +4, same as the poll which shows Romney up by 5. Notice how the conservatives aren’t heralding this poll as having a more realistic demographic sample! I don’t see a demographic breakdown, so I can’t tell if they give Romney a much smaller lead among independents.

This is the point where I throw up my hands and say there’s too much data to make any of it useful. Sure, there are a couple suspicious data points that allow the conservative to suspect liberal bias, but there are other data points that don’t fit that theory. Maybe that’s why Nate Silver fundamentally just averages the polls. I find the conservative theories about Ohio as intriguing as the next guy, but when 7 polls in 48 hours all give Obama a 2-5 point lead, it’s gotta be hard to argue that every single one of them has biased crosstabs.

Ultimately, none of this really matters except maybe for the egos of everyone advancing their pet theories. Are the conservative ideologues finding legitimate holes in the “scientific” analysis of “Smart People” who are blinded by their own biases to the glaring problems in their models? Or have these conservatives merely tortured the data to reach the conclusion they wanted all along, committing the very crime of which they accuse their ideological opponents? In five days, we’ll all find out.

4 thoughts on “Polling and Turnout and Crosstabs… Oh Sigh.”

  1. The bit about whether polls consistently oversample Ds, however suspicious, isn’t a gripe about Nate Silver’s (or anyone else’s) electoral model, it’s a gripe about the polls. Electoral models wouldn’t be trying to second-guess poll data anyway, they just take such data as inputs. It would be like blaming a CDO valuation model for the fact that the subprime bonds inside CDOs were overvalued; CDO models have their problems but if the inputs are inflated that’s not a CDO model problem (I do expect to see the strained electoral model-CDO model analogy pop up – “was Obama polling a bubble?” – if Romney wins, so I’m mentally preparing myself).

    I do crave (good/believable-fantasy) conspiracy theories or explanations that convince me that polls systematically oversample Ds. As it stands, what I know is that pollsters construct a *model of demographics* (X% of the country are white males, Y% are black females, etc.) and then try to call people till they get a sample with those proportions. If the resulting population turns out D+9, and reality is D+3, that’s just a statistical result. Maybe it could just as easily gone the other way and been R+3, I don’t know. If this random effect is unbiased you’d expect this to wash out over several polls, so RCP averages and the like should be fine to look at.

    If the effect is biased toward Ds, that would certainly be good news for Rs, but I guess what I need (to indulge that fantasy successfully) is a convincing story of why their mechanism for sampling would necessarily contain that bias. I haven’t seen one; I haven’t even seen a good reason to believe that D+9 is necessarily false – how do we know? What’s our ‘golden data’ for party-ID breakdown, 2008 exit polls? Seems iffy. And overall, there is the fact that polls just haven’t been way way wrong in years past, when I presume they did everything the same way.

  2. The bit about whether polls consistently oversample Ds, however suspicious, isn’t a gripe about Nate Silver’s (or anyone else’s) electoral model, it’s a gripe about the polls. Electoral models wouldn’t be trying to second-guess poll data anyway, they just take such data as inputs. It would be like blaming a CDO valuation model for the fact that the subprime bonds inside CDOs were overvalued; CDO models have their problems but if the inputs are inflated that’s not a CDO model problem (I do expect to see the strained electoral model-CDO model analogy pop up – “was Obama polling a bubble?” – if Romney wins, so I’m mentally preparing myself).

    I do crave (good/believable-fantasy) conspiracy theories or explanations that convince me that polls systematically oversample Ds. As it stands, what I know is that pollsters construct a *model of demographics* (X% of the country are white males, Y% are black females, etc.) and then try to call people till they get a sample with those proportions. If the resulting population turns out D+9, and reality is D+3, that’s just a statistical result. Maybe it could just as easily gone the other way and been R+3, I don’t know. If this random effect is unbiased you’d expect this to wash out over several polls, so RCP averages and the like should be fine to look at.

    If the effect is biased toward Ds, that would certainly be good news for Rs, but I guess what I need (to indulge that fantasy successfully) is a convincing story of why their mechanism for sampling would necessarily contain that bias. I haven’t seen one; I haven’t even seen a good reason to believe that D+9 is necessarily false – how do we know? What’s our ‘golden data’ for party-ID breakdown, 2008 exit polls? Seems iffy. And overall, there is the fact that polls just haven’t been way way wrong in years past, when I presume they did everything the same way.

Comments are closed.