Detecting incel misogyny on Reddit

Credit: Pixabay/CC0 Public Domain

On Nov. 7, 2017, the American social-media platform Reddit shut down r/Incels, an online forum with more than 40,000 members. This was in line with a new policy the company brought in, banning content that incites violence.

Deprived of their outlet, where did all those misogynistic incels go? Where online did they wind up conversing, and, importantly, could their conversations still be detected on Reddit itself?

To answer those questions, Canadian researchers at Université de Montréal sifted through reams of posts, applying advanced automated text analysis techniques to distinguish incel comments from other online discussions.

Their results are published in the French-language scientific journal Traitement Automatique des Langues.

They blame women

Incels (a portmanteau of “involuntary” and “celibate”) are men who share a worldview that blames women and an unjust society for their inability to form romantic relationships.

Their misogyny has led to murderous acts of violence in North America, including a mass killing in Isla Vista, Ca. in 2014 and a van attack on pedestrians in Toronto in 2018.

The problem for researchers who study their online presence, however, is that it’s hard for computer algorithms to detect incel rhetoric because it uses coded language and is constantly reinventing itself.

That’s where the new study—authored by Dominic Forest, a professor in UdeM’s School of Library and Information Sciences (known by the acronym EBSI), and his doctoral student Camille Demers—comes in.

Their research began in an EBSI classroom in the fall of 2021. Having participated in international competitions on detecting cyberbullying, Forest suggested that his data-mining students tackle the topic, too.

Demers, then a master’s student, and her group took up the challenge. “I saw that their work had potential,” recalled Forest. He suggested they continue the project outside the classroom. “This led to a collaboration and the project evolved considerably before reaching its final form.”

A protracted effort

Transforming a class project into a rigorous scientific study that addresses the challenges of processing massive amounts of constantly evolving data was in fact a protracted effort.

How do you train a machine to recognize incel discourse? The first step is to feed it a slew of examples. But instead of manually labeling tens of thousands of comments, the researchers opted for a “community bags” approach. That is, rather than evaluating each message in a Reddit forum (known as “subreddits”), they classified entire subreddits as representative of a specific type of discourse.

Drawing on previous work, they identified 23 subreddits as incel strongholds. They then extracted 40,000 comments from these forums and labeled them “incel,” and compiled an equivalent sample of comments from more than 13,000 other subreddits to create another corpus tagged “non-incel.”

Overcoming an imbalance

The next technical challenge was to overcome the problems created by the imbalance between incel and non-incel discourse on the Internet. Incel statements make up only a tiny proportion of online conversations, so an algorithm trained on a realistic sampling of online comments could become “lazy” and identify no comments as incel. It would almost always be right but the results wouldn’t tell the whole story.

To prevent the computer from taking the easy way out, the researchers trained it on data in which incel comments were overrepresented. They ran a series of tests, varying the proportion of incel comments from 10% to 90%. That way, they could find the best balance for the machine to learn to distinguish between the two sets of data.

Data collection was another headache. As access to Reddit data changed during the project, the researchers turned to compressed archives made available by an online community of enthusiasts.

They then sorted conversations by month to avoid seasonal biases, such as men’s potentially feeling more isolated over the December–January holidays.

Best overall model

After testing three methods for converting text from human-readable to numerical form and four classification algorithms, the researchers found a model that performed best. It was one that combined a text conversion method called SBERT with a logistic regression algorithm. This model achieved an overall F-score (a metric that measures the performance of a machine learning model) of 79.7%.

However, powerful new analytic models such as SBERT are frustratingly hermetic. “They’re more effective, but in practice it’s impossible to know why they made a given decision,” Demers noted. “We couldn’t determine which characteristics it was taking into consideration.”

On the other hand, TF-IDF statistical weighting—an approach Forest considers more traditional—is slightly less effective but more transparent. So the researchers tried it. With TF-IDF, they were able to extract the vocabulary that the machine deemed most relevant for identifying incel comments. The terms that carried the most weight turned out to be multiple.

There was “incel,” of course, but also “chad” (a man considered attractive by the incel community), “woman,” “ugly,” “lonely,” “virgin” and “normies” (people considered normal by incels).

“In addition to detecting incel rhetoric, this project enabled us to describe it,” said Forest. “It showed what involuntary celibates talk about and what vocabulary their communities use.”

More information:
Camille Demers et al, Comparison of methods for detecting incel speech on Reddit, ATALA (2025). DOI: 10.57896/2024-tal-65_3_2

Provided by
University of Montreal

Citation:
Detecting incel misogyny on Reddit (2025, November 12)
retrieved 12 November 2025
from https://phys.org/news/2025-11-incel-misogyny-reddit.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Source link

Latest

Colombian military attack on suspected rebel camp leaves 19 dead

At least 19 people have been killed in an...

brown butter snickerdoodles – smitten kitchen

Brown your butter: In a medium saucepan, melt butter...

Eddie Murphy Documentary’s Biggest Revelations

The documentary Being Eddie isn’t a totally exhaustive portrait...

Raptee.HV T30 Review: Accessible High-Voltage Technology

Internationally, high-voltage electric motorcycles are not new, but they...

Newsletter

spot_img

Don't miss

Colombian military attack on suspected rebel camp leaves 19 dead

At least 19 people have been killed in an...

brown butter snickerdoodles – smitten kitchen

Brown your butter: In a medium saucepan, melt butter...

Eddie Murphy Documentary’s Biggest Revelations

The documentary Being Eddie isn’t a totally exhaustive portrait...

Raptee.HV T30 Review: Accessible High-Voltage Technology

Internationally, high-voltage electric motorcycles are not new, but they...

India October inflation matches RBI outlook, easing hopes for rate cuts

A man pushes a trolley inside a supermarket in...
spot_imgspot_img

Colombian military attack on suspected rebel camp leaves 19 dead

At least 19 people have been killed in an air strike on a suspected rebel camp in Colombian province of Guaviare, according to the...

brown butter snickerdoodles – smitten kitchen

Brown your butter: In a medium saucepan, melt butter over medium-high heat. It will melt, then foam, then turn clear golden and some toasted...

Eddie Murphy Documentary’s Biggest Revelations

The documentary Being Eddie isn’t a totally exhaustive portrait of Eddie Murphy, but for the generations who have been entertained by him since he...