MicroBob
10-08-2004, 07:53 AM
This might have already been covered....apologies if this is the case. Just curious about the various presidential polls and the differences from one poll to another and one date to the next.

In one national poll I believe on CNN I thought I saw that there were 1100 or 1200 participants in the poll and their margin of error was +/- 4%.

Is this correct? With a national election of 100-150 million people or so (I'm guessing) you only need to poll 1200 individuals to be within 4%??? Is this with 95% confidence within 4% or 99%??

I know very very little about SD and probabilities etc. I just trust the experts' on the math (if I'm told I need a minimum of 300BB's then I go with that).

Without really knowing too much about the math involved, 1200 strikes me as a very small sample....but I seem to remember from one of my college courses that you really don't need as big a sample as you might think to accurately predict election results.

Also, remember that I only caught this graphic at a glance so I might have seen it incorrectly.

But I do suspect that smaller than necessary sample-sizes might be to blame for the various polls inconsistencies where Kerry leads in one poll by 2 points and Bush leads in another poll by 4 points, etc etc.

Additionally, I just find it hard to believe that there are really THAT many people out there who are changing their opinions of who they are going to vote for THAT much. I think that some of the day-after-debate fluctuations might have more to do with normal variance from less-than-adequate sample-size than it is a true reflection of where the country is leaning.

But, again, since I don't know much about the math I admit that this is little more than a hunch (and the evidence I see in the very different results from one poll to another)so I will defer to the super-duper knowledgeable folks of this forum.

pzhon
10-08-2004, 09:50 AM
In one national poll I believe on CNN I thought I saw that there were 1100 or 1200 participants in the poll and their margin of error was +/- 4%.

Is this correct? With a national election of 100-150 million people or so (I'm guessing) you only need to poll 1200 individuals to be within 4%??? Is this with 95% confidence within 4% or 99%??

That is correct. The number of people you need to poll to achieve some level of accuracy hardly changes as the population size increases.

The standard deviation after one coin-toss is .5. The standard deviation after 100 coin-tosses is 5 = 10*.5. The standard deviation after 1225=35^2 coin-tosses is 17.5 = 35*.5. You expect the result to be within 2 standard deviations of the mean just over 95% of the time, and that would be within 35 after sampling 1225 people, an error of 1/35 which is less than 3%.

When the population is not divided 50-50, this slightly decreases the standard deviation, so the above estimate still applies.

Systematic errors are far more serious. The survey was probably conducted by calling people. People who don't have working phones would not be included. People who refused to take part in the survey would not be included. Even if the sample were representative, the election is not decided by the popular vote.

MicroBob
10-08-2004, 11:12 AM

Systematic errors are far more serious. The survey was probably conducted by calling people. People who don't have working phones would not be included. People who refused to take part in the survey would not be included. Even if the sample were representative, the election is not decided by the popular vote.

I agree that the systematic errors can cause significant distortion. And I KNOW that the math is the math and confidence within +-4% is just that.
Still....even if it were possible to have ZERO systematic errors....if I could get a reasonable representative sample of different segments/demographics of the country for my 1,200 person poll....and you could get a different 1,200 people that is equally representative....then don't you think it reasonable that I could get a sample that has Kerry up by 5 points (+/-3) and you could get a sample that has Bush up by 5 points (+/-3).

Without trying to sound like someone who thinks online-poker is rigged....and as one who truly does believe in the math...I have to say that this is a situation that just doesn't FEEL right to me. That is, even though you are showing me the math and I vaguely remember the same stuff from a college course (I think it was a political-sci course but it might have been mathematics) I still just find it hard to believe that polling 1200 people can actually successfully represent 150-million votes with that degree of confidence.
And I think that some of the differences in the different polls aren't JUST systematic but are just natural differences that you are going to get when you poll just 1200 out of 150-million likely voters.

I'll more closely at the math that you went through....this is really just interesting stuff to me and nothing more.
Thanks again.

TomCollins
10-08-2004, 11:14 AM
This is the problem with polls. They WEIGHT the data. So they try to get the data as a representitive sample of who will vote. So they may ask how often you've voted in the past, or whether or not you say you'll vote. But this is an art. That is why so many polls out there are so very different. Turnout is the key to the election.

If you did have a true population that you could keep track of, choose randomly from this population, and be 100% accurate they would vote, this would be scientific fact. But it is not, and remains an artform.

As for predictions, I would rely more on markets. Right now, Bush is selling around 60% on some I've seen. True information comes out when there is a \$ motive.

MicroBob
10-08-2004, 12:06 PM
Some of the polls I've seen will provide numbers for 'likely voters'.

In other words, they will say something like:
'Bush is leading in the so-and-so poll 47 to 45 percent. But when you only consider the respondents who describe themselves as "likely" voters then Kerry is ahead 48 to 44 percent.'

TomCollins
10-08-2004, 01:25 PM
The question is how do they decide who is a likely voter. They also try to get a mix of ethnic and geographic groups as well. They try to mix income, and only count some of the voters. Deciding who is "likely" to vote is the artistic part of it.

mmbt0ne
10-08-2004, 03:28 PM
Lucky for you we just took a quiz on polling data in statistics about 5 hours ago. Basically, they take the half-interval margin of error (kind of a copout from what I can see) defined as:

z(alpha/2)*sqrt(pq/n)

where z is the t-distribution at an infinite degree of freedom, p is the probability of something happening, q is 1-p, and n is the sample size. For simplicity, they round z to 2, 95% confidence interval has z=1.96. Also, before the data you can assume that p=q=.5 since that will give you the highest possible margin of error. This simplifies a lot, and you end up getting:

MoE = 1/sqrt(n)

So, for a margin of error of 4%, you need &gt;= 625 people. Of course, this all really depends on how they pick the participants. I hope that's right, because I put 625 down on the quiz.

zerosum
10-08-2004, 03:41 PM
I offer the reprint below to illustrate an example of how the issue of weighting influences reported polling results. Please do not read this post to be an indication of support for any particular candidate.

Why You Should Ignore The Gallup Poll This Morning - And Maybe Other Gallup Polls As Well
by Steve Soto
Friday :: Sep 17, 2004
http://www.theleftcoaster.com/archives/002806.html

This morning we awoke to the startling news that despite a flurry of different polls this week all showing a tied race, the venerable Gallup Poll, as reported widely in the media (USA Today and CNN) today, showed George W. Bush with a huge 55%-42% lead over John Kerry amongst likely voters. The same Gallup Poll showed an 8-point lead for Bush amongst registered voters (52%-44%). Before you get discouraged by these results, you should be more upset that Gallup gets major media outlets to tout these polls and present a false, disappointing account of the actual state of the race. Why?

Because the Gallup Poll, despite its reputation, assumes that this November 40% of those turning out to vote will be Republicans, and only 33% will be Democrat. You read that correctly. I asked Gallup, who have been very courteous to my requests, to send me this morning their sample breakdowns by party identification for both their likely and registered voter samples they use in these national and I suspect their state polls. This is what I got back this morning:

Likely Voter Sample Party IDs – Poll of September 13-15
Reflected Bush Winning by 55%-42%

Total Sample: 767
GOP: 305 (40%)
Dem: 253 (33%)
Ind: 208 (28%)

Registered Voter Sample Party IDs – Same Poll
Reflected Bush Winning by 52%-44%

Total Sample: 1022
GOP: 381 (38%)
Dem: 336 (33%)
Ind: 298 (30%)

In both polls, Gallup oversamples greatly for the GOP, and undersamples for the Democrats. Worse yet, Gallup just confirmed for me that this is the same sampling methodology they have been using this whole election season, for all their national and state polls. Gallup says that "This (the breakdown between Reeps and Dems) was not a constant. It can differ slightly between surveys" in response to my latest email. Slightly? Does that mean that in all of these national and state polls we have seen from Gallup that they have "slightly" varied between 36%-40% GOP and 32%-36% Democrat? I already know from an email I got from Gallup earlier in the week that in their suspicious Wisconsin and Minnesota polls they seemingly oversampled for the GOP and undersampled for the Dems. For example in Wisconsin, in which they show Bush now with a healthy lead, Gallup used a sample comprised of 38% GOP and 32% Democratic likely voters. In Minnesota where Gallup shows Bush gaining a small lead, their sample reflects a composition of 36% GOP and 34% Democrat likely voters. How realistic is either breakdown in those states on Election Day?

According to John Zogby himself:

If we look at the three last Presidential elections, the spread was 34% Democrats, 34% Republicans and 33% Independents (in 1992 with Ross Perot in the race); 39% Democrats, 34% Republicans, and 27% Independents in 1996; and 39% Democrats, 35% Republicans and 26% Independents in 2000.

So the Democrats have been 39% of the voting populace in both 1996 and 2000, and the GOP has not been higher than 35% in either of those elections. Yet Gallup trumpets a poll that used a sample that shows a GOP bias of 40% amongst likely voters and 38% amongst registered voters, with a Democratic portion of the sample down to levels they haven’t been at since a strong three-way race in 1992?

Folks, unless Karl Rove can discourage the Democratic base into staying home in droves and gets the GOP to come out of the woodwork, there is no way in hell that these or any other Gallup Poll are to be taken seriously.

How likely is it that the Democrats will suffer a seven-point difference against the GOP this November or that the GOP will ever hit 40%?

Not very likely.

The real problem here is that Gallup is spreading a false impression of this race. Through its 1992 partnership with two international media outlets (CNN and USA Today), Gallup is telling voters and other media by using badly-sampled polls that the GOP and its candidates are more popular than they really are. Given that Gallup’s CEO is a GOP donor, this should not be a surprise. But it does require us to remind the media, like Susan Page of USA Today, who wrote the lead story on the poll in the morning paper, and other members of the media who cite this poll today, that it is based on a faulty sample composition of 40% GOP and 33% Democrats.

BettyBoopAA
10-15-2004, 12:50 PM
another obvious problem with election polls is the election is decided by electoral college not popular vote so polls by state would be more accurate than a national poll.

Bulldog
10-15-2004, 01:11 PM
Here kids, have fun:

