But guess what—there's actually a free-range, locally produced, antibiotic-free, (relatively) simple model that, to my everlasting surprise, does better than both of them. It's based on the idea that most of the time, the polls are off by a fairly predictable amount related to the partisan lean of the state. I'll call it DRM (Dreaminonempty's Regression Model), 'cause I gotta call it something.
Returning to Markos' post, here's the table he posted with DRM predictions and polling averages for the final 10 days added in (the closest predictions for each state are highlighted):
And the best news? It's easy to make a DRM prediction. See below the fold for simple instructions and analysis of additional races.
A simple explanation: (updated from the comments)
Polling averages often don't predict the election margin (%D-%R) correctly, even if they do predict the winner. Usually, they underestimate the Democrat's performance in Blue states, and underestimate the Republican's performance in Red states. This is likely because, as LNK put it, people tend to conform to their surroundings.
I attempted to correct for this in a very simple manner, in hopes that, on average, the new predictions would be more accurate than the polls alone. The correction involved adding a number to the polling margin in each state based on how 'red' or 'blue' it was.
In the end, the corrected polling numbers were more accurate than the polling numbers alone. The method worked.
Instructions for DRM
1. Look at polls for about the previous month (Oct. 1 onward for a final prediction). Are there more than three polls in the past 10 days? If yes, skip to Step 2. If no, average the Democratic margins (%D-%R) for all polls in the month, and go to Step 3.
2. Do you see a trend in the polls over the past month? If you see a trend, average the Democratic margins (%D-%R) for the polls for the previous 10 days only. Otherwise, average the Democratic margins (%D-%R) for all polls in the month.
3. Find your state on this table. Add the DRM factor you see to your polling average for the margin estimate.
4. Check for red flags. If there are only one or two polls total, or a third party candidate is drawing more than 5 percent of the vote, the estimate could be off by a substantial amount. Also, states with a federal/local party mismatch, like West Virginia, could run into trouble.
Example: In my first post on this subject, I used Elizabeth Warren's Senate race in Massachusetts as an example, so let's return to that race.
The polls for this race from Oct. 1 onward are pretty steady. There are 17 polls, with an average margin of +3.8 points in Warren's favor.
The state table shows Massachusetts with a DRM factor of +4.1 points, yielding a estimated margin of 7.9 points in Warren's favor. The actual result was Warren +7.4 points.
How does the DRM work?
The basic idea is that in red states, Republicans do a little better than the polls say they will, while in blue states, Democrats tend to do a little better than the polls say they will. You can see the relationship here:
How well does this method work?
In my final pre-election post, I said I would consider the model a success if the average error is lower for the predictions than for polling averages alone. As shown above, the DRM model was indeed successful by this measure in close presidential and Senate races. What about the rest?
Below I link to all the predictions and errors, including 538 predictions for comparison to the best of the well-known models. Please note that counting is not complete in many states, so some of these numbers could change. Also, I had a second set of predictions, based not on the regression but the prior performance of polls in each individual state in either presidential races or Senate and Governor races. This I will call DSE (Dreaminonempty's State Errors).
Here are links to predictions and errors for the three sets of races:
Note that some changes were made after originally posting these predictions as data entry errors were found and corrected.
Generally speaking, the worst DRM predictions were for states with few polls, and some of the races with third party candidates drawing more than 5 percent of the vote.
The first way to test the predictions is to ask which prediction was best most often. By this measure, DRM was by far the best, with the closest prediction more than half the time:
Out of curiosity, I redid the above table, but only included states with 10 or more polls, and excluded states with third parties >5 percent. The accuracy of these predictions was clearly better, to no great surprise. The relative performance of the different prediction methods remained about the same.
Questions to address
With all the new 2012 data, I hope to look into the following issues:
Can we do anything to sort out races with third parties?
Is there a reason why sometimes DSE is better than DRM?
Is there a better measure for state partisanship than Obama's 2008 vote share?
Is there an issue with using this method for gubernatorial races?
Hopefully these questions will lead to improvement without sacrificing simplicity.
Final thoughts: (updated from the comments)
This method of prediction is nowhere close to the same level of sophistication as 538. The intent was to try to find something as simple as possible that was better than the polls alone. This is about as simple as you can get - it's only got two inputs. By keeping it simple, it is easy for many people to replicate on their own and and put to their own uses.