Does moderating hate speech moderate users?

Not necessarily, but more research is needed.

The latest evidence

In a recent field experiment, Twitter users who posted hate speech were randomly reported for violating the platform's terms of service. Reported posts were more likely to be deleted by Twitter, but there were no other (direct) effects: the authors of these posts did not change their activity or their hatefulness.


Earlier studies have found mixed evidence, but there are questions about the causal interpretation of their findings.

One of the first papers to study this question found that Reddit's ban of hateful subreddits reduced engagement and hate speech. The paper used matching and a difference-in-differences strategy comparing members of the banned subreddits with similar users. Some concerns with the causal interpretation of these results are a violation of the parallel trends assumption and that it may not be possible to isolate the effect of moderating hate speech from the effect of closing the forums (including non-hateful interactions).

A recent study of three high-profile influencers who were deplatformed on Twitter found a reduction in conversations about those influencers, as well as in the activity and toxicity of their supporters.  However, this study lacks a control group and there might be a mechanical effect due to an increased monitoring of Tweets that mention these accounts.

Another article finds a backfire effect: users who get banned on Twitter/Reddit increase their activity and toxicity on Gab.  Again, one concern is the lack of an appropriate control group.

Have a question or comment? Let us know.