
By Sharmila Nair
Eliezer Yudkowsky once said “Artificial Intelligence is not settled science, it belongs to the frontier, not the textbook”. One of the reasons for Artificial Intelligence (AI) not being a settled science is that its growth is incumbent on access to ‘big data’ which is large data sets containing information, structured or unstructured, including personal data of individuals gathered from several sources including the media, Internet of Things (IoT) and the cloud platforms, among several others.
Access to big data can help build productive and advanced AI systems or could be destructive for individuals, companies or even governments. For instance, in February 2018, Google and its health-tech subsidiary, Verily, created an advanced artificial intelligence algorithm to predict the heart health of a patient by only studying their eyes. For this initiative, they collected and analysed medical records of 10,000 individuals over the course of four years. At the same time, the Chinese government is blatantly violating privacy rights by building a ‘predictive policing programme’, based on big data analysis in Xingjiang, where the information is gathered on individuals, often without their knowledge. A few individuals detained under this programme have been held without a trial and have been subjected to abuse.
The fluidity of big data raises several issues with respect to data security and ethical norms to be followed by entities but these are often neglected. For instance, Google did not provide any specifics on data privacy, usage or retention of any data with respect to Google Duplex. Is there any scope of a deviation to use the personal data of the caller for any purpose other than as requested by the caller? In such an instance, could the personal data be ‘masked’ or de-identified? Google, has not provided any such clarifications, all of which are a part of the principles developed by the UN Global Pulse under the ‘Data Privacy, Ethics and Protection: A Guidance Note on Big Data for Achievement of the 2030 Agenda’ which was adopted by the members of the UN Development Group in 2017.
Some of the data security requirements mentioned under the Guidance Note are also a part of the ‘EU General Data Protection Regulation (GDPR)’, which came into effect on May 25, 2018, and the ‘Guidelines on Automated individual-decision making and Profiling’ by the Article 29 Working Party. Certain data protection provisions in the GDPR, such as transparency on the algorithm created from big data for AI innovations, are controversial and while it may work towards privacy, it may also stall AI innovations.
Companies across borders are able to drive up sales based on the pattern of data collected from each jurisdiction. This is not necessarily alarming or negative for the user, whose life is often simplified when user behaviour can be predicted. For instance, the Pinterest Lens, the AI feature, allows users to click a picture of anything around to ‘pin’, create boards and even buy similar products online. Can such images, taken by Indian users, which may contain personally identifiable data, be retained on the Pinterest database? As per the Justice Sri Krishna Committee Report on data protection in India, limiting big data retention for a limited time period may be difficult as new purposes for such data may be discovered after data collection thereby resulting in the company retaining the data indefinitely. This standard may pose to be a significant problem in big data security in India. Perfecting the balance between privacy and innovation is key to developing the AI ecosystem in India.
While the Indian government has allocated $480 million in 2018-19 to promote and develop AI and IoT, and analysed data security under the Justice Sri Krishna Committee Report, it would be advisable for the government to favourably consider the Guidance Note to specifically frame privacy laws for big data, in order to avoid repetition of privacy issues raised from the Google Duplex launch. On a review of international guidelines, the report and the characteristics of big data flow, one of the fundamental factors to account for in the Indian big data privacy protection laws would be regulating cross-border data flows as per the Canadian method of having a ‘comparable level of protection’ which would stimulate innovation while securing data in a cohesive and collaborative manner, including on jurisdictional issues. Application of the ‘adequacy test’ where data flows are dictated by the benchmark set by the data transferee, as mentioned in the Report, could put in a dent in AI innovations coming into India. Secondly, it is imperative to layout exceptions for data retention, not limited to directly-identifiable big data beyond the specific scope of use, which has not been adequately addressed in the Report. The issue of data retention is especially a mandatory issue with respect to government surveillance. Lastly, individuals must have the option for sale of data on data exchanges such as the Ocean Protocol i.e., a decentralised data exchange.
As the world moves towards a digital overhaul, it is imperative for the Indian government to address significant big data flow policy issues, as above mentioned, for AI innovations. It is key that user consent is retained even for big data where the end use of such data is identified. The key to a balanced data protection law would be to protect the privacy of individuals while simultaneously stimulating innovation in India.