The Limits of Translation: Navigating Linguistic and Cultural Nuances
Abstract
As language models and machine translation become increasingly essential for cross-cultural communication, their ability to consider cultural nuances remains limited. This limitation presents a significant challenge in hate speech classification, where cultural awareness is crucial for accuracy. This study investigates the cultural sensitivity of hate speech classifiers by evaluating their performance across four languages: English, Hausa, Yoruba, and Igbo. By translating a dataset initially in English into Hausa, Yoruba, and Igbo, we assess the classifiers' effectiveness in detecting hate speech within different cultural contexts. Our results show a significant drop in performance, with Naive Bayes and Logistic Regression models showing as much as a 35% decrease in F1 scores when tested on the translated datasets. Additionally, BERT’s F1 score fell by up to 25.9%, with the most significant reduction noted for Hausa. There is also a notable increase in false negative rates, underscoring the cultural gap in these models. This study shows that we need language models that understand different cultures for better and more accurate hate speech classification in various languages.
DOI URL: