Scholars have mined Facebook data for years

Academics reveal that special programs captured the behaviour of up to hundreds of millions of Facebook users

Prof Michael Kosinski of Stanford University, who gathers data from Facebook. Academics have studied Facebook pages in the name of science, but these troves are unsecured, posing a privacy risk.

Prof Michael Kosinski of Stanford University, who gathers data from Facebook. Academics have studied Facebook pages in the name of science, but these troves are unsecured, posing a privacy risk.

San Francisco

IN JULY 2014, a team of four Swedish and Polish researchers began using an automated program to better understand what people posted on Facebook.

The program, known as a "scraper," let the researchers log every comment and interaction from 160 public Facebook pages for nearly two years. By May 2016, they had amassed enough information to track how 368 million Facebook members behaved on the social network. It is one of the largest known sets of user data ever assembled from Facebook.

"We're concerned about how easy it was to collect this," said Fredrik Erlandsson, one of the researchers and a lecturer at the Blekinge Institute of Technology in Sweden. In December, he and his colleagues published a research paper in the journal Entropy detailing how their methods of trawling social media sites could be replicated.

For more than a decade, professors, doctoral candidates and researchers from academic institutions around the world have harvested information from Facebook using techniques similar to those of Mr Erlandsson and his team. They have compiled hundreds of Facebook data sets that captured the behaviour of a few thousand to hundreds of millions of individuals, according to interviews with more than a dozen scholars.

Their practices came to light in March when The New York Times and The Observer of London reported that Aleksandr Kogan, a University of Cambridge psychology professor, had obtained the data of up to 87 million Facebook users through a quiz app. Mr Kogan sold the information to Cambridge Analytica, a political consulting firm with ties to the Trump campaign so it could build psychographic profiles of American voters. Last week, Cambridge Analytica said it would cease operations after the uproar over its use of personal information. But while what happened with Mr Kogan's Facebook data set is now known, the fate of other information hoards is murkier. In many cases, the data was used for research or scholarly articles. The information was then sometimes left unsecured and stored on open servers that offered access to anyone. Some academics said the data could have been easily copied and sold to marketers or political consulting firms.

The potential result is more leakage of Facebook users' information through academic circles, said Rasmus Kleis Nielsen, a professor of political communication at the University of Oxford who has studied data collection from Facebook.

"The academic world is highly decentralised, and each individual, each institution, has a different way of securing their data," Mr Nielsen said. "Even if almost everyone in the academic community is careful and protects the data, you still can end up in a situation where someone is careless or acts in bad faith and sells access. It's hard to imagine how Facebook stops that from happening."

The Times reviewed half a dozen Facebook data sets compiled by academics from 2006 to 2017. One, gathered from 2015 to 2017 by researchers in Denmark and New Zealand, examined 1.3 million people in Denmark - about a quarter of the country's population - to determine how liking one political page on Facebook could predict how someone would vote in the future. Another set, from 2013, by a group of Norwegian academics focused on the civic engagement of 21 million Facebook members on four continents.

The Danish research team did not respond to a request for comment. Petter Bae Brandtzaeg, one of the Norwegian researchers, said he understood concerns about data gathering.

"As a researcher you get immediate access to people's behaviour, attitudes, feelings and relationships, which are, of course, tempting for all," he wrote in an email. He said many researchers lacked the technical expertise to properly secure data.

The Facebook data was typically amassed through scraper programs that trawled the social network to document what was posted, or through quiz apps that requested access to people's profiles. The results included users' locations, interests, political affiliations, Facebook interactions and even music preferences.

In most cases, researchers assigned numbers to people whose Facebook information they had obtained to maintain anonymity. But the more data there is, the easier it is to overlay one information set with another to identify someone.

One 2015 paper published in the journal Science looked at credit card spending data and found that data scientists could pinpoint 90 per cent of the shoppers by name with just four random pieces of information from sites such as Facebook, Instagram and Twitter.

Once people are identified and their interests and interactions known, they can be targeted with advertising and mobilised for political campaigns or other causes.

For years, Facebook had no specific policies about academics' access to user data, though it had guidelines on working with third parties. While the company has a rule that forbids the use of scrapers, it has not enforced that policy against scholars. And at times, it has assisted researchers with studies.

In 2014, though, Facebook began limiting third-party apps, such as quizzes, from obtaining users' information. Since Mr Kogan's actions were revealed, fuelling an outcry over data privacy, Facebook has made further changes. The company has given people more control over their privacy settings.

Last month, Facebook also narrowed the number of academics it would work with, saying it would collaborate with those who wanted to research the effect of social media on elections through an "independent election research commission". Only scholars with election-related projects can apply.

"We are taking a hard look at the information apps can use when you connect them to Facebook, as well as other data practices," Susan Glick, a Facebook spokesman, said in a statement. "These other data practices include academic research."

In Britain, researchers were doing similar work through different means. In 2007, Michal Kosinski, then deputy director at the Psychometrics Center at the University of Cambridge, worked with a colleague, David Stillwell, to create My Personality, a quiz app that offered to assess people's personalities in exchange for data about them. It was one of the first times a quiz app had been used for obtaining Facebook members' information. My Personality has now collected details on more than 6 million Facebook users.

In interviews with The Times, Mr Kosinski and Mr Stillwell said they took great care to keep the data they procured anonymous.

Mr Kosinski said collection of information from Facebook had become widespread over the years, not only by academics but also by developers, marketers, data analytics companies and others.

"What Kogan did was wrong. But what Kogan did, many others do on a much larger scale," Mr Kosinski said. "They just don't get caught." Some scholars said Facebook's recent privacy changes may have gone too far by also cutting off academics who behaved responsibly.

"Academics would argue that we need access to primary data," said Nielsen of Oxford. He said the changes might lead to an asymmetry, with internal Facebook researchers accumulating mounds of data while outside academics would not. NYTIMES