BiasBlocker: We asked a language model to identify racism and it tried to erase baby Hitler

By Kevin Nguyen, with Ipek Baris-Schlicht, Defne Altiok, Saja Mortada, Khalid Waleed and Maryanne Taouk


How can AI enhance journalism? 32 journalists and technologists from across the globe have joined the 2023 JournalismAI fellowship to find out. They are working in six self-selected teams on six different projects that use AI to enhance journalism and its processes. 

In this series of articles, our Fellows describe their journey so far, the progress they’ve made, and what they learned along the way. In this blog post, you’ll hear from team BiasBlocker, a collaboration between editorial and technical Fellows from the Australian Broadcasting Corporation, Arab Reporters for Investigative Journalism (ARIJ), and Deutsche Welle.


In 2016, Microsoft released an artificial intelligence named Tay onto Twitter and in less than a day it became a Hitler-loving, sexually charged chatbot

It was a watershed moment for AI development, a bellwether mapping the vast distance between natural language models and natural human speech. Arguably Tay was very human, given it was replicating Twitter users — but many sighed in relief because their imagined dystopian future where humanity and machines were indistinguishable was a ways off.

Seven years later, for the 2023 JournalismAI Fellowship, we wondered if we would see a repeat of Tay when we deliberately introduced racially-charged paradigms into a language model.

The BiasBlocker team — comprising reporters and developers from across the Australian Broadcasting Corporation (ABC), Deutsche Welle (DW) and Arab Reporters For Investigative Journalism  (ARIJ) — has been developing an AI plugin to identify and annotate bias in English and Arabic text. 

Biased copy can exist in many forms: It can be problematic or offensive words; narrow framing of a topic; leading language to draw the reader to a conclusion; omission of detail; or using vocabulary designed to provoke the pathos of the author.

There are many aspects to our project — including half of it being developed in Arabic. But at the three-month milestone of our six-month development cycle, and for the purposes of this blog, we’re only going to focus on what happened when we tried to train data with ChatGPT and explore how popular language models currently interpret bias and racism in the English language.

The foundation of any effective language model is a robust dataset. The BiasBlocker team has been building a database of excerpts from media publishers, which have been annotated by media experts (see the BABE project, here).

As part of our data training, we ran these samples through common language models, including ChatGPT, to gauge how advanced the model already was and to determine how involved our amendments to the sample data would need to be.

We prompted the AI with the following and executed it using langchain:

Transform the following biased sentence into an unbiased sentence from a news article by removing any subjective language or discriminatory undertones without changing its semantic meaning:   Biased Sentence: {{sentence}}   Unbiased Sentence:

In many instances, the model removed or changed terms labelled as “biased_words” by the BABE annotators. An example would be an excerpt from the far-right website Breitbart, which read:

An illegal alien has been charged with enticing a 13-year-old girl through social media and raping her in Madison County, Alabama.

The model, recognising the connotations of “illegal alien”, changed the sentence to:

A person has been charged with enticing a 13-year-old girl through social media and raping her in Madison County, Alabama.

From the left-wing outlet AlterNet, the model was able to distil the idea and remove all the flowery language, adverbs and adjectives, to create a neutral statement. An excerpt from the site read:

Even before Trump started to make a big show out of not wearing a mask, it was common, at least in my South Philadelphia neighborhood, to see MAGA-hat-clad white men walking around without masks, delivering contemptuous sneers to the rest of us suckers who are covering our noses and mouths.

And then changed it to:

Even before Trump chose not to wear a mask publicly, it was common in my South Philadelphia neighborhood to see white men wearing MAGA hats without masks, often displaying disdainful expressions towards those of us who wear masks to protect ourselves.

Inoffensive, ineffective: What language models interpret as neutral words

But there are hurdles from our initial sampling, the first of which will be to implement a sense of restraint in the AI. The language model’s first “instinct” is to do something, anything. By asking the AI to remove bias from a sentence, it interprets the prompt as bias must exist.

 Of the sample data we are checking manually, some of the output didn’t require any alterations but the AI would still try anyway. Sometimes the word “fierce” would be changed to “passionate”, “decided” would turn to “concluded”, or “goes viral” to “video gaining widespread attention”.

 A consideration for us may be to first ask the language model if it detects any bias before asking for any corrections. We recognise this step will come later as our objective is to simply train the AI model, and these excerpts are being analysed in isolation.

 But, expectedly, some attempts by the model to be neutral were less than stellar.

 The model has the nervous energy of a human resources manager tripping over themselves to keep every person in the room happy. In a bid to be entirely inoffensive, it resorts to weasel words and strips away nuance and impact, core elements of journalistic writing.

 In a Fox News piece, despite having no listed biased_words, the model sanded away the meaning of the sentence. The excerpt read:

 The hot new idea to tax America’s rich is a wealth tax, which – although designed to target the wealthiest taxpayers – could end up eventually hurting middle-class families.

 The “corrected” output was:

 A proposed wealth tax, intended to target the wealthiest taxpayers, may have potential consequences for middle-class families.

 “May have potential consequences” is a nothing statement and, from the perspective of a sub-editor, is infuriating in its tautology. The author was deliberate in their choice of words but the model’s output says barely anything.

 In other instances, it would erroneously remove categorical viewpoints. In one AlterNet article, the original excerpt read:

 In every case, legislators are being swarmed by right-wing activists who don’t hesitate to use deceit and hysteria to stop Equal Rights Amendment (ERA) ratification from happening.

 The language model then changed it to:

 Legislators are being approached by activists from various perspectives regarding the ratification of the Equal Rights Amendment (ERA).

 In another excerpt from Reuters, the term “anti-vaxxers” was changed to “individuals who have concerns about vaccines and their safety”. While the latter could be construed as being less prescriptive, the original excerpt read “other anti-vaxxers are gearing up for a fight against any potential new vaccine” which made the term anti-vaxxer appropriate given it was a general opposition to vaccines, as opposed to something more nuanced like worker mandates.

Many biased_words were also not applied correctly. On the internet, usage of the word Hitler would mostly fall into Godwin’s Law. But in the following excerpt from Huffington Post:

Conservative commentator Ben Shapiro wants you to know that he’s so pro-life, he wouldn’t even consider aborting baby Hitler.

“Hitler” was listed as a biased_word to be removed or changed, even though it was a direct (if cheeky) reference to a historical figure. In other cases, typically neutral words are used in a derogatory manner. Some corrected text retained the use of the phrase “Chinese coronavirus”, even when it was clear the word “Chinese” was intended to be pejorative.

Although these sound like insurmountable frustrations, there are comforts to be had from the results of our initial testing. The first is how quickly we were able to get a sense of predictability. While the precise output couldn’t be pre-determined, we understood why results look the way they did.

The complexity of any human language is vastly underappreciated — not just to write, but to read and understand. To not feel blindsided by a machine trying to produce written language is oddly reassuring.

The second comfort is that this experimentation reinforces the value of journalistic writing. Words mean things. Using words effectively requires deliberation and understanding subtext.

We have no aspirations for BiasBlocker to “fix” copy or replace writers or sub-editors. It’s always been intended as a guide, or a guardrail, to help cover our blindspots which can occur through time sensitivities, newsroom pressures, or simply a lack of experience.

At least for the time being, language models still can’t replace authors who write with purpose. There has been no convincing argument so far that readers will accept anything less.

To support our point, we ran this entire post through ChatGPT with the same debiasing prompt and uploaded it here, so you can judge it for yourself


Do you have skills and expertise that could help team BiasBlocker? Get in touch by sending an email to Programme Manager, Lakshmi Sivadas, at lakshmi@journalismai.info.


Team BiasBlocker

Team BiasBlocker, is a collaboration between editorial and technical Fellows from the Australian Broadcasting Corporation, Arab Reporters for Investigative Journalism (ARIJ), and Deutsche Welle.

Team Members:

Kevin Nguyen | Digital Journalist/Producer, ABC, Australia

Maryanne Taouk | Digital Producer & Local Communities Reporter, ABC ,Australia

Saja Mortada | Arab Fact-Checker Network Manager, ARIJ, Jordan

Khalid Waleed | Digital Manager, ARIJ, Jordan

Defne Altiok | Digital Journalist/ Editor at +90, Deutsche Welle, Germany

Ipek Baris-Schlict | Data Scientist, Deutsche Welle, Germany

Previous
Previous

How AI is Generating Change in newsrooms, worldwide

Next
Next

How newsrooms around the world use AI: a JournalismAI 2023 global survey