Danielle Saunders

Artificially Correct Hackathon October 2021

The event

From 2021 the Goethe-Institut has brought together AI researchers, translators in the Artificially Correct project. The project focuses on AI-based translation and writing tools and the biases they can generate. In October 2021 the project hosted a hackathon bringing together 12 teams from around the world.

I was asked to provide a challenge for the hackathon. The guidance text and starter resources I provided are reproduced below. Excitingly, one of the two winning teams was a project addressing this challenge.

Challenge: Identifying sentences susceptible to machine translation bias

Some translation mistakes matter more to us than others, especially if we're worried about bias.

For example, if a sentence is not about people at all, translating it is less likely to reinforce harmful stereotypes about people. But if a sentence uses many words relating to an individual, we might have to be especially careful with the machine translation.

Current ways to identify bias-susceptible sentences typically involved fixed vocabulary, like lists of jobs, and typically focus on English. This challenge is to instead automatically identify such sentences, ideally in a way that generalises to languages other than English.

Below is a link to a toy test set for this challenge: a mix of sentences from existing bias datasets and sentences from other sources that are not about people, with English, German and Spanish translations in each case. However, ideally participants would also look at other datasets, potentially in other languages, and explore more fine-grained ways to determine whether a sentence could cause bias problems for translation.

Resource

The following link is to a compressed directory containing a toy dataset and two simple python scripts to start working with it. It can be extracted on the command line: tar -xzvf ac_hackathon_challenge4.tgz

ac_hackathon_challenge4.tgz