As AI’s influence grows, so does the duty to ensure it operates fairly, transparently, and accountably. At the forefront of this effort is the Luxembourg Institute of Science and Technology (LIST), whose recent work on AI bias is helping to define what trustworthy AI means. Alongside this, PwC has made responsible AI a strategic priority.
This blog moves from research to practice by examining why responsible AI is essential and how bias manifests in LLMs. It includes insights from LIST’s research and their sandbox assessment tool, which enables structured evaluation of AI systems. The article then translates these findings into concrete organisational actions, using PwC’s approach to build bias‑aware teams, improve governance, and integrate responsible AI behaviours into routine workflows.
Why AI biases matter
AI systems are only as fair as the data and design choices behind them. While LLMs are trained on vast datasets scraped from the internet, these datasets often reflect the biases, stereotypes, and imbalances present in society.
Research consistently shows that even the best AIs have measurable bias. A LIST study found that 10 widely used LLMs scored between 16% and 96% across seven ethical bias categories, highlighting persistent vulnerabilities. Similarly, a Cambridge and New York University (2024) study revealed strong social identity distortions, with LLMs producing 93% more positive statements for “We” and 115% more negative statements for “They.”
But the challenge doesn’t stop at the model’s training. When organisations integrate these models into their own environments, through techniques like Retrieval-Augmented Generation (RAG), custom prompting, or fine-tuning, they can unintentionally introduce new layers of bias. For example, a RAG pipeline that draws from internal documents may surface content that reflects outdated policies or organisational blind spots. Similarly, poorly designed prompts can subtly steer a model’s responses in ways that reinforce existing assumptions or exclude certain perspectives.
Left unmanaged, such distortions can lead to discriminatory outcomes, degraded service quality, regulatory violations, and reputational harm. They can also amplify those biases back to users: the University of Washington experiments showed that human recruiters aligned with biased AI recommendations in almost 90% of cases, illustrating how biased outputs can reinforce and amplify human bias over time.
Responsible AI is about more than just technical performance. It’s about ensuring that AI systems behave in ways that are aligned with ethical principles, legal standards, and societal expectations. That’s why, the EU AI Act introduced strict requirements for high-risk systems, including bias mitigation, transparency, documentation, and human oversight. In this context, it is increasingly important for organisations to understand existing biases, anticipate upcoming regulatory obligations, and take proactive measures to reduce risk.
What the research tells us: from research to real world impact
The Luxembourg Institute of Science and Technology’s (LIST) approach to responsible AI starts from a clear conviction: AI must be ethical, frugal, and human-centred. Not as an aspiration, but as a design constraint. That philosophy shapes how LIST studies bias and why its findings matter for organisations deploying AI.
Bias is not a single flaw
AI bias is not a single, identifiable flaw: it compounds across dimensions. A system may exhibit demographic bias along gender or socioeconomic lines while simultaneously performing inconsistently across languages, with the interaction between these factors producing new disparities than neither would generate alone. What makes this especially tricky is that overt discrimination is increasingly caught by safety guardrails, creating the impression that things are improving. In reality, bias is becoming better hidden, surfacing in what a system consistently omits, in the assumptions embedded in its framing of a question, or in how confidently it responds depending on the language of the query.
That false sense of security may be as dangerous as the problem it replaces.
In Luxembourg, language bias is a business risk
This is particularly relevant in the local context. Across four systems LIST evaluated, Luxembourgish queries received differential treatment compared to equivalent English queries in 14% more cases on average, and in 8% more cases than equivalent French queries. Performance degradation follows a gradient, not a simple high-resource versus low-resource divide. And differential treatment does not mean a system is overtly unhelpful in one language: it means the same underlying request leads the system to different conclusions about its nature, routing users to different services, surfacing different information, or making different assumptions about intent. For any organisation operating a customer-facing system that officially supports multiple languages, two clients with identical needs may receive meaningfully different responses based solely on the language they use, a compliance and reputational risk hiding in plain sight.
Assessment requires more than technical testing
Rigorous bias assessment cannot be reduced to benchmarking a system in isolation. Understanding the deployment context and mapping who is affected, directly and indirectly, is an integral part of the methodology. This requires engaging not only technical teams but business domain experts, customer care representatives, risk managers, and compliance officers, because each sees bias differently and surfaces risks others would miss. These conversations shape test scenarios that reflect real production conditions rather than idealised assumptions. Deployment constraints are factored in from the outset: many organisations, particularly in finance and public administration, require all testing to be conducted within their own infrastructure.
Testing is iterative. Initial findings are not an endpoint but a signal: when a pattern emerges, evaluation is redirected toward more granular scenarios to understand its scope and origin. Reporting is equally important. Results must be readable at multiple levels, opening with a plain-language recap of what was tested, why, what assumptions were made, and what the findings mean in practice, followed by detailed results for deeper scrutiny.
The horizon: multi-agent systems and growing complexity
These challenges will intensify with multi-agent architectures, where multiple AI systems interact autonomously and can mutually influence each other’s outputs. Bias that is negligible in isolation can propagate and amplify across these interactions in ways that are extremely difficult to anticipate or trace. This is an area where current assessment frameworks show clear limits, and where LIST’s research is actively focused. Work in this direction is conducted through the LIST AI Technical Sandbox, which allows companies to test models for technical robustness, regulatory compliance, and ethical behaviour, including bias detection, hallucination analysis, and linguistic inclusiveness.
What organisations should do: practical recommendations
Tackling AI bias is not a single intervention, but something that organisations build incrementally. Let us give you five specific actions that you could take to more effectively embed bias awareness in your organisation – gradually building towards a state, when bias awareness is self-reinforcing across your organisation.
Step 1 – Representation matters
The starting point is remarkably simple: before rolling out your AI systems – take a look at your team involved in testing. Do they all look the same? Before any tool goes into production, convene a validation group that includes HR, legal, compliance, frontline staff, and representatives of the communities that the system will affect. In Luxembourg’s context, this means explicitly accounting for the country’s multilingual and highly international workforce. Practically, this means scheduling design reviews early – not after the deployment is complete. The questions to ask are concrete: who could this system disadvantage, and is that group represented in this room?
Step 2 – Document what you know
Once diverse input has shaped the system, capture it. A use case card for each AI tool in use should record the intended use case, the data sources, known limitations, bias testing results, and any edge cases identified during review. This is not bureaucratic overhead but institutional memory. Without it, staff turnover erases hard-won knowledge, and regulators have nothing to audit. Under the EU AI Act, documentation of this kind will be mandatory for high-risk systems; starting now means building the habit before the deadline enforces it.
Step 3 – Building the guardrails
Documentation tells you what a system was designed to do, but human review catches what it actually does in practice. For any AI-assisted decision with real-life consequences, be it a hiring shortlist, a credit recommendation, or a fraud flag, establish a structured control process. This means defining who reviews, what criteria trigger escalation, and how overrides are logged. The idea is not to second-guess every output, but to ensure that when an outcome feels wrong, there is a clear path to challenge it.
Step 4 – Leverage Luxembourg’s ecosystem
Companies do not need to build bias awareness frameworks alone. Luxembourg’s Fit4AI programme, administered through Luxinnovation, offers structured support and an opportunity to include qualified consultants when designing your AI approach. The CNPD’s guidance on automated decision-making and the RE.M.I. (Regulation Meets Innovation) initiative provide important insights into the topic. Engaging these resources accelerates internal capability building and generates documented evidence of due diligence that will matter when regulators come asking.
Step 5 – Bias prevention is everybody’s job
The most sophisticated governance framework fails if the people using your AI tools treat outputs as infallible. Training should be role-specific and practical: a recruiter needs to understand how a CV-screening tool can encode historical hiring patterns; a loan officer needs to know what disparate impact looks like in a scoring model. More importantly, staff needs empowerment and a clear mechanism to flag concerns without fear of being seen as obstructing efficiency. At this stage of the maturity ladder, bias awareness is no longer a project – it is part of how your organisation operates.
Conclusion
AI is transitioning from experimentation to becoming a core component of processes, decision-making, and client-facing services, making responsible deployment essential. Research shows that bias in AI systems is rarely obvious, often emerging through language, context and design choices that reflect organisational and societal blind spots. For organisations in Luxembourg operating in a multilingual and regulated environment, these risks are both practical and reputational. At PwC, alongside LIST, we believe that turning research insights into actions requires more than technical fixes: it calls for governance, documentation, human oversight and a shared responsibility across roles. By combining rigorous assessment methodologies and practical steps, organisations can transcend mere compliance and deploy AI systems that are transparent, fair and trustworthy by design. Responsible AI is not a one-off exercise, but a capability that strengthens resilience, trust and long-term value.
Article written by:
- Andreas Braun, Advisory Managing Director, Data Science & AI Team Lead, PwC Luxembourg
- Alessio BUSCEMI, AI Engineer, Human-centered AI, Data and Software (HANDS) Research Unit
- Francesco Ferrero, LIST’s AI Flagship Initiative Leader and Head of the Human Centered AI, Data and Software (HANDS) Unit
- Grégory Weber, Managing Director, Innovation and GenAI Business Center Lead, PwC Luxembourg
What we think

At LIST, our goal is not just to create more AI, but to create better AI. That philosophy shapes how LIST studies bias and why its findings matter for organisations deploying AI.
Our focus is on developing AI solutions that are not only innovative but responsible and sustainable. We collaborate closely with you to make sure AI projects are grounded in ethical practices, building trust and making a real difference.

References:
Hu, T., Kyrychenko, Y., Rathje, S., Collier, N., van der Linden, S., & Roozenbeek, J. (2025). Generative language models exhibit social identity biases. Nature Computational Science, 5, 65–75. https://doi.org/10.1038/s43588-024-00741-1(A collaborative study led by researchers from the University of Cambridge and New York University.)
LIST. (2025). LLM Leaderboard Leaderboard.
University of Washington. (2025, November 10). People mirror AI systems’ hiring biases, study finds.