Monday, February 11, 2008

Can Missouri tell us anything about Ohio?

Right now there is no polling data on the democratic primary in Ohio that is set to happen on March 4th, 2008, so I'm hoping that we can use other factors to help us determine what the outcome will be. My brother tells me that Ohio will go for Obama because he has a lot of natural constituencies, but I want to do at least a little bit of analysis to see if we can get something more than a gut reaction here. I was looking at demographics on Wikipedia and some past election data on uselectionatlas, and I have come to the conclusion that Missouri could be a bellweather in predicting how Ohio is going to vote. I'll detail the reasons below.

According to wikipedia, Missouri and Ohio have very, very similar demographics. The table below illustrates this:

Demographics in Missouri vs. Ohio
(includes ~2% hispanic)

So the question is: Does similar demographics in the two states mean that voters will vote the same way in the primary election? I don't think this can be answered right now, especially given Obama's broad base of support in various parts of the country. However, we can look at how the two states voted in the past and try to use this as a predictor.

Missouri and Ohio have voted the same way in the presidential elections since 1960, with similar margins

Presidential elections
So in every election after Kennedy vs. Nixon, both Missouri and Ohio voted for the same candidate, who was also the winner. Interestingly, the margins were actually very similar, as the table below illustrates.

Matchups in General elections 1964 - 2004

1964: Johnson (J) vs. Goldwater (G)
J: 61.05%
J: 64.05%
J: 62.94%
1968: Nixon (N) vs. Humphrey (H)
N: 43.42%
N: 44.87%
N: 45.23%
1972: Nixon (N) vs. McGovern (M)
N: 60.67%
N: 62.29%
N: 59.23%
1976: Carter (C) vs. Ford (F)
C: 50.08%
C: 51.10%
C: 48.92%
1980: Reagan (R) vs. Carter (C)
R: 50.75%
R: 51.16%
R: 51.51%
1984: Reagan (R) vs. Mondale
R: 58.77%
R: 60.02%
R: 58.90%
1988: Bush (B) vs. Dukakis (D)
B: 53.37%
B: 51.83%
B: 55.00%
1992: Clinton (C) vs. Bush (B)
C: 43.01%
C: 44.07%
C: 42.91%
1996: Clinton (C) vs. Dole (D)
C: 49.23%
C: 47.54%
C: 47.38%
2000: Bush (B) vs. Gore (G)
*B: 47.87%
B: 50.42
B: 49.97%
2004: Bush (B) vs. Kerry (K)
B: 50.73%
B: 53.30%
B: 50.81%
*Note: in 2000, Bush won the election with less of the popular vote than his opponent, Al Gore. However, Bush won the popular vote in both Ohio and Missouri.

Analysis: Really, I want to answer the following questions:
1) Does Missouri vote like Ohio in the general election
2) Does Missouri vote like the US in the general election
3) Does Ohio vote like the US in the general election

In order to tests these things, I ran 3 independent paired two tailed t-tests with an alpha of .05 with the following results:
1) Missouri vs. Ohio: p=.1952, No significant difference. Average margin of absolute difference 1.41%
2) Missouri vs. US: p=.0442, Missouri votes significantly different than the US. Average margin of absolute difference = 1.65%
3) Ohio vs. US: p=.4311, No significant difference. Average margin of absolute difference 1.17%

So the conclusion from this is that Missouri doesn't vote differently from Ohio significantly in presidential elections, but Missouri does vote significantly different than the US. Ohio doesn't vote significantly different than the US in presidential elections (actual this result showed that Ohio is more likely to closely mirror the presidential outcome than Missouri is likely to mirror Ohio's result).

Primary elections
It's really difficult to use previous primaries to help predict this race, simply because the previous contests have happened at different points in the election where the race might have already been decided, and therefore the voting would have been different if we had a situation like we have here in 2008. So until I can figure out a method to make these results meaningful, I'm just going to leave them out.

Using demographic data and absolute margin of difference to predict Ohio
Model #1: Using Missouri to model Ohio
If we just make the assumption that the electorates are the same between the two states, and will vote the same in Ohio as they did in Missouri with a margin of error of the absolute difference +- 1.41%, then it's very easy to predict the outcome:
  • Missouri 2008 primary: Barack Obama 49.40% vs. Hillary Clinton 48.18%
  • Model #1 Ohio 2008 primary projection: Barack Obama 50.81% to 47.99% vs. Hillary Clinton 49.59% to 46.77%
Model #2: Using US election data so far to model Ohio
We've already determined that using the actual vote percentages from the entire US vote have historically been a better predictor of the Ohio vote than using Missouri for this. Given this, the second model just uses the current vote percentages in the states that have voted so far and the absolute historical difference between the US presedential vote and the Ohio presidential vote of 1.17%.
  • US vote percentages so far: I don't have a good source for this, but I think I saw on TV that Obama has a lead of about 250,000 votes out of 16 million cast, or 1.56% total. So just to plug in some numbers, let's say Obama is at 48.56%, and Clinton is at 47.00%
  • Model #2 Ohio 2008 primary projection: Obama: 49.73% to 47.39% vs/ Clinton: 48.17% to 45.83%
Model #3: Using demographic data we've seen so far to predict the outcome in Ohio
I question the accuracy of this kind of approach without fully understanding the demographics. We've seen Obama appeal to the whitest of white voters (Maine), so I can't begin to understand how we could use this as predictive. Revisit this

This race looks to be a very tight one with a slight edge to Obama based upon this analysis. Even without doing this analysis I doubt very much that either candidate will blow the other one out here, which is bad news for both candidates, but perhaps worse news for Clinton. Missouri turned out to be not as great as I had hoped in being a predictor, as the US election seems to be superior, but it is better than nothing. One limitation is that we don't know how much past voting behavior in presidential elections can predict voting in primary elections, but since Ohio has an open primary, I suspect they will be similar.

In a follow up post, I'll try to use model #1 (Difference from Missouri) and model #2 (Difference from US) and see how well it fares against contests that have already been decided

