Your AI Chatbot Might Be Discriminating Against You, Based on Your Accent

Research shows that AI systems, including large language models like ChatGPT, may respond differently depending on how users speak or write. Studies indicate that chatbots can systematically disadvantage speakers of non-standard or minoritized English dialects.

Researchers tested GPT-3.5 Turbo and GPT-4 using text from ten English varieties, including Standard American English, Standard British English, and eight widely spoken non-standard varieties such as African American English, Indian English, Irish English, Jamaican English, Kenyan English, Nigerian English, Scottish English, and Singaporean English. The models often defaulted to standard English in their responses. Evaluations by native speakers revealed that responses to minoritized dialects were more stereotyping, demeaning, and condescending, and models demonstrated lower comprehension of these inputs.

When researchers asked models to mimic non-standard dialects, comprehension decreased and stereotyping increased. GPT-4 performed better than GPT-3.5 in comprehension and friendliness but still showed an 18 percent rise in stereotyping. Feature retention in the models’ outputs correlated with the size of the dialect-speaking population, with standard varieties retaining the most distinctive linguistic features.

A real-world audit of Amazon’s customer service chatbot, Rufus, confirmed these disparities. Rufus struggled with zero copula constructions, common in African American and Singaporean English. For example, when asked,

“Is this jacket machine washable?”

Rufus responded incorrectly 6 percent of the time. The same prompt written as

“This jacket machine washable?”

failed 69 percent of the time.

The disparities stem from how language models are trained. Models are built on massive datasets scraped from the internet, where text from minority communities is often underrepresented or filtered. Standard American English dominates training data and is prioritized in research, making it the default for model outputs.

Studies using the AAVE/SAE Paired Dataset highlighted how dialect differences are encoded. Tweets written in African American English were translated into Standard American English, but crowdworkers frequently altered more than dialect, removing contractions, punctuation, emoticons, and profanity. This changes the style and tone of prompts, affecting how models respond to different dialects.

Bias extends beyond English. Research on GPT-2, RoBERTa, T5, GPT-3.5, and GPT-4 shows African American English content was associated with negative descriptions, lower-prestige jobs, and higher perceived criminality compared with equivalent Standard American English content. A study of German dialects found similar patterns: speakers of regional German varieties faced bias, and explicitly labeling dialects increased negative associations.

These biases have serious consequences. Dialect discrimination in AI mirrors real-world inequities, affecting access to housing, employment, education, and legal processes. As AI chatbots become more widespread, speakers of minoritized dialects may encounter less helpful, accurate, or respectful responses, limiting access to services that increasingly rely on these systems.

Of course this is nothing compared to one incident in India. When Dhiraj Singha used ChatGPT to help polish his postdoctoral application, the model automatically replaced his surname with “Sharma,” a name associated with higher-caste individuals in India. Although his original name was never mentioned, the AI apparently inferred a more “typical” high-caste identity based on patterns in its training data. This adds further evidence that large language models can exhibit biases that affect social identity and opportunities.

The research suggests that unless AI developers address these disparities, the technology risks reinforcing existing social and linguistic inequalities.

References

Fleisig, E., Smith, G., Bossi, M., Rustagi, I., Yin, X., & Klein, D. (2024). Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination. arXiv preprint arXiv:2406.08818.

Harvey, E., Kizilcec, R. F., & Koenecke, A. (2025). A Framework for Auditing Chatbots for Dialect-Based Quality-of-Service Harms. arXiv preprint arXiv:2506.04419.

Bui, M. D., Holtermann, C., Hofmann, V., Lauscher, A., & von der Wense, K. (2025). Large Language Models Discriminate Against Speakers of German Dialects. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Suzhou, China, pp. 8223–8251.

Hofmann, V., Kalluri, P. R., Jurafsky, D., & King, S. (2024). Dialect prejudice predicts AI decisions about people’s character, employability, and criminality. Nature, 633(8028), 147–154.

Martin, A., & Jenkins, K. (2024). Speaking your language: The psychological impact of dialect integration in artificial intelligence systems. Current Opinion in Psychology, 58, 101840.