Political science departments can contribute a lot by introducing graduate level courses on research methods. This will help understand the findings of a poll.

Dear Friends (if I may call you so),

I am one of you. I share your nervousness about forecasting an election, almost go sleepless the night before the counting day, and jump in joy when we manage to correctly
predict an election outcome, and through a heartbreak when the gap between prediction and the final outcome is unbelievably large. After our continued failure in predicting electoral outcomes (except in elections in which the direction of results was clear before the voting day), the forecasts about Karnataka results have deeply troubled me.

While many of us went gaga over how close our predictions were to actual results, the truth is that there was an utter disregard for some basic standards that we should uphold. Let me say this upfront: This letter is not directed against any individual pollster or polling agency.

There are two questions that I wish to address in this letter. First, why the forecasts on 12 May were all over the place? Second, why election forecasting should be done with rigour and taken seriously? All pollsters, whether they had a sample size of 1,000, 5,000 or 50,000 proudly declared in the TV studios that their margin of error was plus minus three. In simple terms, what this means is that if the poll estimates that a party would get 50 per cent, it may get something between 47 and 53 per cent. Getting into the detail of margin of error is a bit complicated, so for general readers, hundred plus years of research in statistics says that all surveys have an error (noise) and only with some level of confidence, can we provide a ball park range for the estimate.

And this was the beginning of why I, who fancies himself as a pollster, is disappointed at the state of affairs. How can the margin of error remain same — plus minus two/or three, irrespective of the sample size? The error is dependent upon sample size. The larger the sample size, the smaller is the margin of error. That’s the golden rule. (See Figure 1)

This opens the floodgates of what is wrong with the polling industry in India.

The survey data helps us in estimating vote shares and based on that statistical models are applied to forecast seat shares. When most polls got their vote shares wrong — as no one had suggested that the BJP with smaller vote share would end up winning larger number of seats — how can we claim that our forecasts were correct? (See Table 1 and Table 2)

Two negative numbers, when multiplied, always lead to a positive number. Similarly, in the forecasting world when two errors are committed in opposite direction and it produces a number that mirrors the actual outcome, it is not science, but fluke. It is a mistake that we are aware of but continue to overlook because no one is paying close attention. Most of us don’t reveal anything about our data collection method. Most pollsters have recently started putting vote by various demographics (caste, class, gender, age, etc), but rarely do they put out a table declaring the representativeness of their sample. One may have a large sample size, like many of us often do, but if the respondents weren’t randomly selected it is unlikely that one would get a representative sample. All estimates are going to be biased and one can claim ‘almost nothing’ with reasonable level of confidence. And even with a representative sample, if the survey instruments were not well designed, the results would again be biased.

The bigger question is, even if one follows rigorous techniques, what is the likelihood of getting a forecast? To be honest, the probability is very low. Apart from the usual issues — population heterogeneity, multi-party competitions, high electoral volatility — pollsters in India are also limited by weak theoretical foundations of why people vote the way they do. If you consider elections as just a few days and forecasting merely as few hours of tamasha, then this letter was not meant for you, and I apologise for putting you through this agony. For many of us, election is a day of celebration of democratic promises we have made each other. It’s a window of opportunity to study the democratic health of our society. In that sense, polling and prediction is as an act of knowledge generation. Voters who share information with us do so with the belief that we are gathering data to make sense of the world and intervene in meaningful ways to make it a better place.

Predictions are based on what we know about how the social and political world works. Unfortunately, we have failed to create a body of knowledge on how people vote in India on which pollsters could rely on. This is where political science departments can make a great contribution by introducing graduate level courses on research methods; particularly survey research. This will help a great deal in overcoming the general lacuna on how to understand the findings of a poll and what to reasonably expect and not expect from them.

Knowledge generation is cumulative exercise and it takes place when we start introspecting about why we have been failing, and it gains momentum when we start understanding the cause of those failures and think about possible solutions. I hope we will agree to some minimum standard that becomes a yardstick of why we fail. In the August 2016 issue of Seminar magazine, some of these issues have been discussed. You may find it helpful.

Yours truly,

Rahul Verma

Full disclosure: I am a PhD candidate in Department of Political Science, University of California, Berkeley, US. I received my initial training at Lokniti-CSDS. The views
expressed here are personal.