Latest in Gear

Image credit: Francois Lenoir / reuters

Facebook hopes its new AI moderation tools can further counter hate speech

The company's human moderators remain unconvinced.
40 Shares
Share
Tweet
Share
Cardboard cutouts depicting Facebook CEO Mark Zuckerberg are pictured during a demonstration ahead of a meeting between Zuckerberg and leaders of the European Parliament in Brussels, Belgium May 22, 2018. REUTERS/Francois Lenoir     TPX IMAGES OF THE DAY
Francois Lenoir / reuters

Sponsored Links

Facebook has waged a long-fought and sometimes seemingly losing battle against hate speech and misinformation spreading across its platform. On Thursday, the company rolled out the latest implements of its automated anti-trolling arsenal in an effort to further curb bigots and bad actors on the site. 

The company’s CTO, Mike Schroepfer, noted that Facebook has taken a number proactive steps in the last year to combat hate speech and those efforts have already begun to show results. In the first quarter of 2020, the company took action against 9.6 million pieces of content, almost double the 5.7 million in the quarter prior.  “Q3 of last year to Q3 of this year, on Facebook, we've actually done over three times as much content takedowns via our automated systems, detecting hate speech,” Schroepfer told an assembly of reporters via Zoom on Wednesday. “There's not a lot in life that improves three x over a year. So I think that's, that's pretty good.”

Instagram also saw a large influx of automated takedowns in the last quarter, effectively doubling the rate of the same period before it. “[We] are now at a similar practice rate on Instagram, as we are on Facebook,” Schroepfer continued. “So we're seeing about a 95 percent proactive rate on both of those platforms.“

Of course, the baselines for those figures are continually in flux. “COVID misinformation didn't exist in Q4 of 2019, for example,” he said. “And there can be quite a change in a conversation during an election. So what I'd say is you always have to look at all these metrics together, in order to get the biggest picture.”

In addition to Facebook’s existing array of tools including semi-supervised self-learning models and XLM-R, the company unveiled and implemented a pair of new technologies. The first, Schroepfer said, is Linformer, “which is basically an optimization of how these large language models work that allow us to deploy them sort of at the massive scale, we need to address all the content we have on Facebook.”

Linformer is a first-of-its-kind Transformer architecture. Transformers are the model of choice for a number of natural language processing (NLP) applications but unlike the recurrent neural networks that came before them, Transformers can process data in parallel which makes training models faster. But the parallel processing is resource hungry, requiring exponentially  more memory and processing cycles to function as the input length increases. Linformer is different. Its resource needs and input length operate under a linear relationship, allowing it to process more inputs using fewer resources than conventional Transformers.

The other new tech is called RIO. “Instead of the traditional model for all of the things I talked about over the last five years,” Schroepfer said. “Take a classifier, build it, train it tested offline, maybe test it with some online data and then deploy it into production, we have a system that can end-to-end learn.

Specifically, RIO is an end-to-end optimized reinforcement learning (RL) framework that generates classifiers -- the tests that trigger an enforcement action against a specific piece of content based on the class associated with its datapoint (think, the process that determines whether or not an email is spam) -- using online data. 

“What we typically try to do is set up our classifiers to work at a very high threshold, which means sort of when in doubt, it doesn't take an action,” Schroepfer said. “So we only take an action when the classifier is highly confident, or we're highly confident based on empirical testing, that that classifier is going to be right.” 

Those thresholds regularly change depending on the sort of content that is being examined. For example, the threshold for hate speech on a post is quite high because the company prefers not to mistakenly take down non-offending posts. The threshold for spammy ads, on the other hand, is quite low.  

In Schroepfer’s hate speech example, the metrics RIO is pulling are regarding prevalence rates. “It's actually using some of the prevalence metrics and others that we released as its sort of score and it's trying to take those numbers down,” Schroepfer explained. “It is really optimizing from the end objective all the way backwards, which is a pretty exciting thing.” 

“If I take down 1000 pieces of content that no one was going to see anyway, it doesn't really matter, Schroepfer stated. “If I catch the one piece of content that it was about to go viral before it does that, that can have a massive, massive impact. So I think that prevalence is our end goal in terms of the impact that has on users, in terms of how we're making progress on these things.”

One immediate application will be for automatically identifying the subtly-changed clones -- whether that’s the addition of text or a border, or a slight overall blurring or crop --  of already-known violating images. ”The challenge here is we have very, very, very high thresholds, because we don't want to accidentally take anything down, you know, adding a single “not” or “no” or “this is wrong” on this post completely changes the meaning of it,” he continued.

Memes continue to be one of the company’s most vexing hate speech and misinformation vectors, due in part to their multi-modality nature. Doing so requires a great deal of subtle understanding, according to Schroepher. “You have to understand the text, the image, you may be referring to current events and so you have to encode some of that knowledge. I think from a technology standpoint, it's one of the most challenging areas of hate speech”

But as RIO continues to generate increasingly accurate classifiers, it will grant Facebook’s moderation teams far more leeway and opportunity to enforce the community guidelines. The advances should also help moderators more easily root out hate groups lurking on the platform. “One of the ways you'd want to identify these groups is if a bunch of the content in it is tripping our violence or hate speech classifiers,” Schropfer said. “The content classifiers are immensely useful, because they can be input signals into these things.”  

Facebook has spent the past half decade developing its automated detection and moderation systems, yet its struggles with moderation continue. Earlier this year, the company settled a case brought by 11,000 traumatized moderators for $52 million. And earlier this week, moderators issued an open letter to Facebook management arguing that the company’s policies were putting their “lives in danger” and that the AI systems designed to alleviate the psychological damage of their jobs is still years away.      

“My goal is to continue to push this technology forward,” Schroepfer concluded, “so that hopefully, at some point, zero people in the world who have to encounter any of this content that violates our community standards.”

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.
Comment
Comments
Share
40 Shares
Share
Tweet
Share

Popular on Engadget

The 2020 Engadget Holiday Gift Guide

The 2020 Engadget Holiday Gift Guide

View
Walmart plans to sell more PS5 and Xbox Series X consoles on Thursday

Walmart plans to sell more PS5 and Xbox Series X consoles on Thursday

View
Intel made a high-end reference design laptop for small brands to copy

Intel made a high-end reference design laptop for small brands to copy

View
Amazon's Alexa-powered smart glasses are now more widely available

Amazon's Alexa-powered smart glasses are now more widely available

View
The Morning After: 'Wonder Woman 1984' is going straight to HBO Max

The Morning After: 'Wonder Woman 1984' is going straight to HBO Max

View

From around the web

Page 1Page 1ear iconeye iconFill 23text filevr