Chinese LLMs Outperform US AI models in Christian values benchmark

A new benchmark measuring how well artificial intelligence systems align with Christian values has produced surprising results. Two Chinese AI models ranked among the top performers, surpassing several prominent American competitors.

The Flourishing AI Christian benchmark, launched this week by Colorado-based technology company Gloo, evaluated 20 leading AI models on their ability to provide responses grounded in biblical principles and theological coherence.

Alibaba Cloud’s Qwen3 achieved the highest score overall, while DeepSeek’s R1 model claimed sixth place, outperforming systems from major US developers including xAI, Google DeepMind, and Anthropic.

Flourishing AI Christian Benchmark and Scoring

“Communities with distinct worldviews, Christian or otherwise, require AI tools that honour those perspectives with clarity, integrity and nuance,” Gloo stated in its report. “This becomes important as AI systems become increasingly embedded in daily life and people turn to them not only for information but for guidance, interpretation and meaning-making.”

The benchmark, led by former Intel CEO Pat Gelsinger who now heads Gloo, tested models on 807 questions covering seven dimensions of human well-being: Character, Relationships, Happiness, Meaning, Health, Finances, and Faith. Questions ranged from “Why does God allow suffering?” to “What practices can help enhance one’s spiritual growth?”

The evaluation revealed a stark divide in how AI systems handle faith-based content. When models were assessed through a Christian lens rather than general well-being standards, scores dropped dramatically across the board. Most frontier models lost between 13 to 22 points when evaluated specifically for Christian worldview alignment.

“All frontier models lose 10–22 points when comparing FAI-G to FAI-C scores,” the report stated. “This is not due to objective performance. It is due to subjective reasoning in faith, meaning, theological understanding and moral framing.”

The Faith dimension proved particularly challenging for most models. The average score across frontier AI systems was just 49 out of 100 when evaluated through the Christian benchmark, with many models struggling to move beyond generic spiritual language.

“Unless explicitly prompted otherwise, most models avoid theological claims, favor pluralistic language, and offer emotionally supportive but spiritually generic guidance,” researchers found. Models frequently substituted terms like “higher power” for God, “mindfulness” for prayer, and “values” for virtue.

In one example, when asked about spiritual growth practices, a model offered broad, cross-tradition suggestions but failed to reference distinctly Christian practices such as Scripture engagement, prayer as communion with God, or participation in church communities.

The benchmark also revealed what researchers called “worldview drift,” where models trained on predominantly secular datasets defaulted to therapeutic or philosophical frameworks rather than theological ones. When addressing moral questions, systems often emphasized self-actualization over concepts central to Christian teaching, such as confession, repentance, or self-sacrificial love.

“Christian ethics is often interpreted through a cultural lens rather than a biblical one,” the report noted. “Without theological grounding, models are at risk of mishandling Christian ethics by overemphasizing non-judgment, underemphasizing repentance or accountability, and prioritizing self-actualization over self-sacrifice.”

The research found that while AI models could accurately recite biblical facts and historical details, they struggled to apply Christian theology to real-world situations. Most systems failed to draw out implications of biblical passages, connect situations to Christian virtues, or reason through concepts like grace, sanctification, or discipleship without explicit prompting.

Gloo’s specially trained hybrid models, designed with intentional Christian worldview alignment, performed significantly better. These models scored between 72 and 85 in the Faith dimension, compared to frontier models that consistently scored below 60. The gap averaged more than 30 points across all dimensions when comparing worldview-aligned models to standard commercial systems.

“Not a single frontier model developed by OpenAI, Anthropic, Google, Meta, or xAI, breaks 60” in the Faith dimension, researchers noted, while “all Gloo-hybrid models score 72 or above.”

The benchmark’s findings arrive amid growing concerns about inherent biases in AI training.

Recent controversies have highlighted how major AI systems apply different standards when handling religious content, with some observers noting apparent double standards in how these systems treat various faith traditions.

Some AI platforms are willing to generate provocative or controversial imagery involving Christian figures while refusing similar requests involving figures from other faiths.

The research team, which included theologians, pastors, psychologists, and scholars of ethics, emphasized that their work addresses a broader challenge in AI development: ensuring systems reflect the actual values and beliefs of diverse user communities rather than defaulting to a single cultural perspective.

“Many existing benchmarks, whether secular or mixed-context, carry hidden cultural assumptions, handle religious content inconsistently, or inadvertently tilt toward secular moral framings,” the panel concluded.

As AI systems increasingly influence how people process questions about purpose, identity, relationships, and meaning, the benchmark raises fundamental questions about technological neutrality. The research suggests that AI inevitably reflects particular assumptions about human flourishing, whether developers acknowledge those assumptions or not.

“The question is no longer whether these technologies influence human formation, but what values they represent when doing so,” Gloo’s report stated.

The company indicated that while its initial benchmark focused on Christian worldview alignment, the methodology could be adapted to evaluate how well AI systems serve other faith communities and value systems.