
Campaign rhetoric brings with it a clarity and swagger which is otherwise unforthcoming. It often exposes the vulnerabilities of our democratic practices. In owning up to development apartheid, a member of parliament has now forced an examination of the origins and consequences of political data mining.
Over the last week, Maneka Gandhi, Union minister and the incumbent BJP MP for the Lok Sabha seat of Sultanpur in Uttar Pradesh, has been threatening voters in her constituency. Initially, she pilloried Muslim voters saying they can’t expect work from her if they didn’t vote for her. She went on to reveal something darker. She proclaimed later at another election meeting, “The parameter is that we segregate all villages as A, B, C, and D. The village where we get 80 per cent votes is A, the village in which we get 60 per cent is B, the village in which we get 50 per cent is C and the village where we get less than 50 per cent is D.” She elaborated: “The development work first happens in all A-category villages. Then comes B and only after work in B is done, we start with C. So this is up to you whether you make it to A, B or C and no one should come in D because we all have come here to do good.”
In theory, Indians are guaranteed a secret ballot by the Representation of The People Act, 1951. However, voting patterns can be ascertained accurately, especially with the advent of Electronic Voting Machines (EVMs) and the resultant Form 20 that is released by the Election Commission of India (ECI). Form 20 is essentially polling booth-level voting data, detailing the number of votes received by all candidates (and parties), down to each EVM. Ordinarily, an EVM is allotted per 1,000-1,500 voters, but records an average of 200-600 votes, meaning even voters living in densely populated areas are likely to be scrutinised along political lines by contesting parties.
Form 20 can be and is used to determine past voting trends, and has long been an integral part of the campaign calculus. However, it forms only one part of what political operatives refer to as the “golden triple” formulation for determining what level of effort is required in each part of a constituency. The golden triple comprises three sets of vital data: Political leaning, demographics and contact details. The first is measured by analysing Form 20 data, to yield past voting trends per booth. It is fine-tuned with surveys that are constantly run to determine forthcoming voting trends. The next, demographics, can be approximated by aggregating the voter rolls per booth to determine such identifiers as gender, age, religion, caste and voting booth (that is residential vicinity). Since rolls contain addresses, the family size can be estimated. Religion and caste are derived using algorithms that process last names. Demographics are often supplemented by other socio-economic data such as income, via proxies like property tax records. Finally, contact information is aggregated in multiple ways, generally not from government documents or public records but via panna pramukhs or other such physical or digital networks. Based on this data, one can figure out how much effort candidates have to put into each area, what is likely to appeal to the people of those localities, and how to most effectively reach every voter. Such analytical approaches for campaigns are useful for political entrepreneurs who don’t enjoy the same kind of fame or funding that their larger counterparts do: It allows them to take more considered decisions with regard to campaign strategies.
The problem, however, arises when this data is used to unleash discriminatory development. Maneka Gandhi thus disclosed an open secret: Politicians, instinctively or analytically, know who is voting for them. So, given the constantly improving analytical capabilities, there exists a real risk of the ghettoisation of certain communities as punishment. This is a direct violation of the spirit upon which the Constitution was built, particularly for Union ministers who agree to “do right to all manner of people in accordance with the Constitution and the law, without fear or favour, affection or ill-will” in their oath of office.
The increasing recognition of this malpractice has evoked several suggestions which include avoiding the use of EVMs or not releasing Form 20 data. Such steps could damage the goals of security and transparency. Aggregation of data by way of cluster counting is another remedy that has been proposed by the EC. A “totaliser” would add together the votes of 14 EVMs such that booth-level data is not recorded and aggregate data alone is made available. However, the kind of political start-ups that India needs, and indeed, craves, can only happen on account of more visibility and greater access, not less. The time has come for various institutions to consider upstream solutions with capacity building. Enhancement of the analytical capabilities of District Development Committees (DDCs), legislatures and the EC can empower these bodies to continuously monitor developmental grants, spot anomalies and take corrective action. The tools and techniques used for understanding and treating electorates should not be the preserve of only those who are most prone to abusing the privileges they are granted.
The persecution of certain communities based on the nefarious use of increasingly pointed analytical capabilities risks perpetuating an already vicious cycle of exclusion. It is imperative that those interested in deepening the roots of our democratic traditions and constitutional principles co-create a progressive solution. And the starting point has to be with our institutions which, even with their flaws, have a rich history of acting as bulwarks against misadventures.
The writer is founder, WalkIn. He previously co-founded, FourthLion Technologies, a political campaign planner