The Measure of Hate on 4Chan

The number of racist and other white supremacist terms has exploded on the message board since the start of the Trump campaign

It's difficult to find a single location – physical or otherwise – so inclusive to the disparate factions of the far-right as 4chan. Its "politically incorrect" message board – /pol/ – has served as a general assembly for all manners of right-wing contrarianism – and extremism – a political forum with a bone-deep elusiveness. Messages are completely anonymous (anyone can post under any name at any time, most opting for the straightforward "Anonymous"). Messages often defy easy political categorization. Even following a 4chan thread poses an unfamiliar challenge akin to having to read from right to left, as the placement of posts and replies follows an abstract geometry that obscures who is speaking to whom. The board is openly reachable to anyone online, but the user experience seems more like that of the dark web, where visitors must perform arcane technological rituals to receive real access.

Janet Reitman's story, "All-American Nazis," which traces the path of four disaffected young men from ironic anti-Semitism to neo-Nazi terrorism, notes how 4chan's veil of obscurity was used to incubate white nationalism. By early 2012, she writes, the site's "tone had shifted drastically to the right." Discussion threads on white supremacist sites "considered how /pol/ might be used to help young people become 'racially aware.'" But while the idea of xenophobic half-jokes mutating into something more virulent seems intuitive, it can be difficult to capture the full picture of an online hate campaign. Were these attempts to weaponize the message board effective? Would an analysis of discussions on 4chan – and especially /pol/ – reveal an increase in extreme right rhetoric on the site?

Ceros teamed up with Rolling Stone to visualize the rise of white supremacy on 4Chan. There's a number of challenges to analyzing the site, not least of which is the fact that posts disappear shortly after publication. A sister site called 4plebs archives the board's typically ephemeral content, but when a programmer and I attempted to scrape 4plebs, the site repeatedly rebuffed us, forcing us to make 11 requests to the domain for each page of posts we wanted – out of a total of about 30,000 pages. After two days of halting progress, 4plebs dumped its data on archive.org, the internet's effective Library of Congress. I later learned the site's monsoon response to our rain dance of scraping requests was merely a favor that the 4plebs team regularly performs.

The datasets immediately revealed the impact of recent politics on total user activity. Beginning around Trump's presidential campaign announcement, in July 2015, and continuing for the next two-and-a-half years, /pol/'s typical activity crept upward to such a volume that by the time 4plebs dumped the data, the size of roughly four years of 4chan posts was 20 gigabytes. For a sense of scale, a typical PC today has 8 gigabytes of memory. My own system has exactly 20 gigabytes, meaning that loading the dataset would consume every last bit of memory available. Trump's election had generated so much discussion on the alt-right mothership that the record of that discussion had ascended to the ranks of Big Data.

Data of this size requires an iterative process: instead of opening every post at once into a single dataframe (which is like a spreadsheet), the file is opened piecemeal in "chunks"; in this case, 100,000 to one million lines at a time. At the same time, we created an algorithm to search for specific terms. For instance, to find how often the phrase "Read Siege" – a neo-Nazi rallying cry to read the white supremacist manifesto Siege – the program has to first search the million lines of the dataset, pull out any instances that include the phrase, count them, save just the relevant lines to a new dataframe, and then repeat, until all 159 million lines have been searched. How long this takes depends largely on the number of times different terms appear in the data. Culling the data for all instances of "atomwaffen," the online neo-Nazi group at the center of Reitman's story, ran a little under two hours. For the epithet "kike," the search lasted for roughly three-and-a-half. 

Boring shapes often make for scary statistics – the occurrence of terms follows a simple up-and-to-the-right trajectory, often staggeringly so. "Read Siege," for instance, went from a negligible rarity to occurring about 200 times in a month. The use of "kike" quadrupled to over 40,000 uses in a month. In January of this year, the site recorded about 115,000 instances of the N-word, a nearly five-fold increase since the start of Trump's bid for president. The complete findings of our analysis is below.