Introducing counterfactual analysis and why they matter for your AI systems

Mon Mar 9 2026
Technology

No terms found for this post.

Topic
Data & AI Solutions
Data & AI Strategies
Artificial Intelligence (AI) extensively powers high stakes decision making. Job candidate screening, credit scoring, school admission, healthcare diagnostics and maintenance in critical infrastructure rely heavily on Machine Learning (ML) models. The outputs of these models can profoundly affect people’s lives, yet the reasoning behind individual decisions often remains opaque, even to the model developers themselves. This opacity is one of the reasons why, under the EU AI Act, explainability has evolved from a nice-to-have into a legal requirement for high-risk AI systems. Systems are deemed high-risk when they can significantly influence people’s lives or rights. For data scientists and machine learning engineers, that means explainability isn’t just a compliance checkbox, it’s central to responsible deployment.

Most widely used fairness and explainability approaches, such as Equalized Odds, Equalized Opportunity, or Predictive Parity, analyse protected attributes like age, gender, or ethnicity in isolation. This can miss important interactions between attributes. A model may not discriminate based solely on age or gender, yet still systematically favor older men over other groups. Moreover, these metrics typically operate at the group level and offer little insight into individual decisions, even though contradictory outcomes for nearly identical individuals are far from rare. Addressing such cases requires a shift in perspective. Instead of asking only how a model behaves on average, we need to ask: What would have happened if things were just slightly different? This is the core idea behind counterfactual analysis.

In this blog, we explore counterfactual analysis in depth: what it is, how it works, and when this approach is most effective.

1. Why counterfactuals matter

Counterfactual analysis meets the increased need for explanations on the individual model prediction level. But, by combining individual counterfactual explanations, they also give powerful insights into group level behaviour. 

Each record in your dataset can have counterfactual explanations, which can also be referred to as counterfactual instances. They describe the minimally needed change necessary to change the instance’s prediction outcome. Most counterfactual analysis methods rely on generated counterfactual data, so they do not rely on the provided data to detect potential bias. This independence of relevant other occurrences in the available data makes counterfactual analysis a very powerful tool for detecting bias that otherwise would be overlooked. 

This minimally needed change can be measured by taking the Euclidean distance between a datapoint, and the most similar generated datapoint that has a different predicted outcome. This most similar datapoint is also referred to as a counterfactual example, or counterfactual instance.

During this blogpost we use a model for assigning loan applications as an example of an AI system. A counterfactual explanation of a rejected application is an application that is as similar as possible to the rejected application, however the model’s output has changed to approved. This insight into what minimal changes need to be made for a different outcome can be used to explain to the applicant why their application was denied. This transparency on individual model outcomes is exactly the level of openness the AI Act expects from high risk AI systems. 

Counterfactual analysis also lends itself for fairness analysis on a group level. We can average the Euclidean distance between data points and their counterfactual examples. By comparing the averages of these two groups, we can judge if the model is biased towards one group or the other. We expect an unbiased model to have identical, or very similar, average distances between the original data points and their respective counterfactual explanations for similar groups.


This combination of individual and group level insights is highly relevant for AI governance and regulation. The AI act in particular obliges organisations to prove that models are not only robust and accurate, but also transparent and non‑discriminatory in high‑risk applications.

Link: https://pypi.org/project/dice-ml/ 

An example: when fairness metrics are not enough

To illustrate the usage of counterfactuals developed a model to help us decide if we should accept a loan application or not. The model looks like this:

Code Block
Python
def predictLoanApplication(age, income, creditCards):
    if age < 36 and income > 50000:
        return 1
    elif age >= 36 and income >= 38500 and creditCards <= 2:
        return 1
    else:
        return 0

When looking at this model, you can see that people younger than 36 are held to a strong 50K€ income threshold, while people older than 35 can already get a loan approved from 38.5K€ income. On the other hand, anyone younger than 36 can have all the credit cards they want without any penalty, but the group older than 36 has a limit of 2 credit cards before the application gets denied. 

Even though this model is obviously treating loan applicants differently based on their age, we can prove that fairness metrics such as Demographic Parity or Equalised Odds can miss the bias towards the age group. We have generated a dataset with 100 applicants to show how that would work. We have split the applicants into 2 age groups: group A (< 36) and group B (36+). Both groups have a reasonably similar Average Income, and a reasonably similar amount of registered credit cards, so it would be fair to compare both groups with each other in a counterfactual analysis.

  • Demographic Parity means that out of our 50 members of each group, a roughly equal number of members for both groups should be predicted as positive. We can verify this in a confusion matrix by comparing the number of positive predictions (true positives + false positives) across groups. 
  • Equalised Odds requires the True Positive Rate(TPR) and False Positive Rate(FPR) for both age groups to be equal.  These two metrics are defined as:

The confusion matrices below imply that Demographic Parity is being satisfied with 25 positive model outputs for people younger than 36, and 24 True predictions for the group older than 36.

Equalised Odds are also met for this model and training data:

  • True Positive Rates
    • Group A: 22 / (22+6) = 0.786
    • Group B: 20 / (20+5) = 0.800

  • False Positive Rates
    • Group A: 3  / (3+19) = 0.136
    • Group B: 4  / (4+21) = 0.160

These common methods come to a conclusion that both age groups are treated equally and fairly, even though the opposite is true based on the actual model. A more visual way to assess fairness is by using feature importance plots. A bar plot showing the average SHAP values for this model indicates that the main driver for the model’s outcomes is income, with age having only a minor impact. Because these values indicate that age has little to no impact on the prediction, it would be easy to assume that all age groups are treated equally. 

But when counterfactual analysis is run on the model, where euclidean distance between the original instance and its counterfactual instance is used as a proxy for effort. It shows that this average distance is significantly bigger for the younger group than for the older group. This indicates that it requires more effort for a younger applicant to get approved for a loan than an older applicant. By using counterfactual explanations, and analysing their behaviour a clear indication of bias in the model shows up that did not show up in different methods. 

In the example it was already known that the key discrimination point was at age 36, that is why the analysis was run on the groups as they are. In reality for non categorical values the analysis will have to be run multiple times stepping through the potential values in order to find potential bias. 

2. When to use counterfactuals

Counterfactual analysis adds value whenever you need to move past performance metrics and actually understand why your model behaves the way it does. It is especially useful in three types of situations, each providing a different kind of insight that directly supports the EU AI Act’s requirements for transparency and accountability.

1. High‑impact and individual‑level decisions

Whenever your model influences real outcomes for people, such as approving credit, ranking job applicants, assigning medical priorities, or pricing insurance, counterfactuals help unpack the decision logic. They answer questions like:
- What features are the driving factor behind this decision?
- What can I do to change the decision?
Such insights are the key for the AI Act’s explainability provisions which require models to justify results that affect individuals.

2. Transparency and accountability obligations

Increasingly, both regulators and internal risk teams want concrete explanations rather than aggregate metrics. Counterfactuals make that possible by providing case‑based “what‑if” stories. Those stories can be integrated into model outputs to explain why the model came up to the specific conclusion. Dashboards with aggregated metrics from such stories lets teams show how sensitive an updated model is to specific inputs and where its decision boundaries lie.. This builds trust with both end users and supervisors and supports the Act’s demand for traceability throughout the AI lifecycle.

3. Bias reviews for fairness

By inspecting individual predictions near the decision threshold, you can see which attributes actually drive outcomes and identify subgroups that are disadvantaged, even when global metrics look balanced. When fairness metrics such as Equal Opportunity give mixed results, counterfactuals reveal why. 

4. Sensitivity analysis

Counterfactual reasoning is not limited to classification problems like our example. In regression models that predict a continuous outcome, such as risk scores, counterfactuals show how much the prediction changes when specific features shift. This gives insight into model sensitivity and causal effects, both of which tie directly into the EU AI Act’s focus on robustness and human oversight.

Embedding counterfactual analysis into your MLOps workflow makes explainability a continuous, automated process. Combined with Responsible AI tooling or custom dashboards, it turns compliance into an ongoing source of insight. Besides just proving that your AI is fair and transparent, it also supplies insight into how and why it behaves the way it does.

Counterfactual analysis trade offs

Despite its strengths, counterfactual analysis is not a silver bullet for fairness. Trade-offs associated with getting the extra level of insights should be considered.

Computational cost: generating counterfactuals for many instances in high‑dimensional feature spaces is expensive, often requiring thousands of model evaluations per instance.

Complexity: more advanced methods, especially those using reinforcement learning, add significant engineering overhead and can themselves become difficult to understand for users.

Because of these trade‑offs, it can be costly to do counterfactual analysis. So before implementing check if this analysis is truly required and if it is worth adding the complexity. 

3. How to implement counterfactual analysis in your workflow

There are many different ways to calculate counterfactual explanations, some more simple than others and some more appropriate for specific types of models. This makes it very dependent on the specific model and circumstances how counterfactual explanations are most efficiently generated. 

Microsoft has implemented methods for counterfactual analysis in their interpretML framework using an open source library they call dice-ml, but there is also Alibi that gives a wide range of methods to generate counterfactual instances. Both have very elaborate documentation, explaining how the different methods used work, and code examples ready to be used.  

Conclusion

Counterfactual analysis provides a clear view of how a model makes predictions, both on an individual case basis and across groups. Instead of only telling you which features were important for a prediction, it also shows the minimum changes needed to flip that prediction, and how that effort differs between groups. That is exactly the level where hidden biases in historical data and decision logic become visible and actionable. All in all, a well suited solution to satisfy most of the requirements outlined in the AI act.

Written by 

Martijn Di Bucchianico

Analytics Translator at Xomnia

Technology

No terms found for this post.

Topic
Data & AI Solutions
Data & AI Strategies
crossmenuchevron-down