Contextualizing Counterfactuals: Gender Differences in Alignment with Biased (X)AI

Ulrike Kuhl und Annika Bush

Introduction. Counterfactual explanations (CEs) in explainable AI (XAI) illustrate how alternative model inputs lead to a change in outcomes, offering actionable insight by mirroring human reasoning. However, XAI’s impact on user behavior is multifaceted and may inadvertently promote over-reliance, legitimize biased outputs, or foster undue trust in inherently untrustworthy blackbox systems. For instance, prior work shows that CEs must be carefully calibrate.d in terms of feature types and directionality. There is preliminary evidence that gender and educational background modulate user responses to XAI systems. In a previous study on the influence of CEs on decision making, we showed that CEs could lead to a reverse effect on the alignment with XAI recommendations, although users hardly reported to recognize an induced gender bias in AI recommendations. However, no study has yet examined how user’s individual factors like gender interact with CEs in biased decision scenarios - particularly when individuals face biases they may already know from real-world experience.

Methods. We re-analysed data from a simulated hiring study to examine whether a) participant’s gender identity influences their alignment with biased AI recommendations and b) whether these differences extend to bias shifts after exposure to biased (X)AI recommendations. 293 participants (147 female) took on the role of hiring managers, repeatedly selecting between two candidate profiles. During an interaction phase, they were exposed to AI recommendations with CEs (XAI) or without (blackbox AI). The recommendations were either male or female biased. We analyzed the proportion of (X)AI-aligned decisions as well as potential bias shifts in participants’ behavior (difference between pre- and post-(X)AI-interaction phases). Separate analyses were performed on data from participants identifying as female and male, respectively. All reported results are Bonferroni-corrected to account for multiple comparisons.

Two-part chart comparing female and male participant data on AI-aligned decisions and participant-bias shift under AI and XAI conditions, with significance markers. — Fig. 1. a) Mean proportion of (X)AI-aligned decisions by gender and condition. b) Mean bias shift in participant behavior from pre- to post-(X)AI interaction, stratified by condition and gender. Whiskers represent the standard error of the mean.

Results. For female participants, CEs significantly increased AI-alignment (F(1,98)=6.935, pcorr=.020, Fig. 1a). Further, a significant interaction effect on bias shift was observed for female participants (F(1,59)=10.303, pcorr=.004,Fig. 1b), indicating that exposure to CEs induced a reversal effect, i.e. a shift in decision patterns in the direction opposite to the AI bias. In contrast, male participants showed no significant differences in either AI alignment or bias shift.

Discussion and Conclusion. Our findings suggest that CE-XAI does not affect all users uniformly. Female participants aligned with AI recommendations more when CEs were present, possibly reflecting greater sensitivity to the potential of bias based on lived experience, while male participants followed the AI recommendations regardless of explanation. This highlights the need to contextualize explanations, as personal user characteristics, such as gender, can shape their effectiveness. Personalized or adaptive explanation systems may be needed to prevent over-reliance or the inadvertent perpetuation of biases. Further, the reversal effect observed only in female participants raises important questions about how explanations interact with prior experiences of bias. CEs may have heightened a subjective awareness of unfairness, prompting a corrective response, yet this interpretation remains speculative and warrants careful re-evaluation in future research.

We show that XAI affects users differently. Future work should focus on explanation strategies that take user characteristics into account, enabling objective and fair decision-making in high-stakes contexts.

Presentation Contextualizing Counterfactuals: Gender Differences in Alignment with Biased (X)AI held at the 3rd TRR 318 Conference: Contextualizing Explanations on 17th of June 2025 in Bielefeld, Germany

Nächstes Kapitel

14 Investigating the Impact of Conceptual Metaphors on LLM-based NLI through Shapley Interactions

In everyday life, metaphorical language is frequently used in the flow of conversations.While metaphors can be observed in explicit cases such as “she was the light of my life”, the meaning manifestation of conventionalized metaphors such as “tax the rich” is more fundamentally grounded in language. …

Schriftgröße

Klein

Mittel

Groß

Hintergrund

% Lesefortschritt

Inhaltsverzeichnis
Contextualizing Explanations
Fußnoten
1. Byrne, R.M.: Counterfactuals in Explainable Artificial Intelligence (XAI): Evidence from Human Reasoning. In: Proc. of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019. pp. 6276-6282. California, CA (2019) https://doi.org/10.24963/ijcai.2019/876;
  Wang, C. et al.: Counterfactual Explanations in Explainable AI: A Tutorial. In: Proc. of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. pp. 4080-4081 (2021) https://doi.org/10.1145/3447548.3470797
2. Yadav, B.R.: The Ethics of Understanding: Exploring Moral Implications of Explainable AI. Int. J. of Science and Research (IJSR) 13(6), 1-7 (2024). https://doi.org/10.21275/SR24529122811;
  Lakkaraju, H., Bastani, O.: "How Do I Fool You?" Manipulating User Trust via Misleading Black Box Explanations. In: Proc. of the AAAI/ACM Conference on AI, Ethics, and Society. pp. 79-85 (2020)
  https://doi.org/10.1145/3375627.3375833
3. Kuhl, U., Artelt, A., Hammer, B.: Keep Your Friends Close and Your Counterfactuals Closer: Improved Learning from Closest Rather Than Plausible Counterfactual Explanations in an Abstract Setting. In: Proc. of the 2022 ACM Conference on Fairness, Accountability, and Transparency. pp. 2125-2137 (2022) https://doi.org/10.1145/3531146.3534630;
  Kuhl, U., Artelt, A., Hammer, B.: Let's Go to the Alien Zoo: Introducing an Experimental Framework to Study Usability of Counterfactual Explanations for Machine Learning. Frontiers in Computer Science 5, 1087929 (2023)
  https://doi.org/10.3389/fcomp.2023.1087929;
  Warren, G., Byrne, R.M., Keane, M.T.: Categorical and Continuous Features in Counterfactual Explanations of AI Systems. ACM Transactions on Interactive Intelligent Systems 14(4), 1-37 (2024) https://doi.org/10.1145/3673907;
  Kuhl, U., Artelt, A., Hammer, B.: For Better or Worse: The Impact of Counterfactual Explanations' Directionality on User Behavior in XAI. In: Proc. of the World Conference on Explainable Artificial Intelligence. pp. 280-300. Springer (2023) https://doi.org/10.1007/978-3-031-44070-0_14
4. Reeder, S., Jensen, J., Ball, R.: Evaluating Explainable AI (XAI) in Terms of User Gender and Educational Background. In: Int. Conf. on Human-Computer Interaction. pp. 286-304. Springer (2023)
  https://doi.org/10.1007/978-3-031-35891-3_18
5. Kuhl, U., Bush, A.: When Bias Backfires: The Modulatory Role of Counterfactual Explanations on the Adoption of Algorithmic Bias in XAI-Supported Human Decision-Making. arXiv preprint arXiv:2505.14377 (2025)
  https://doi.org/10.1007/978-3-032-08333-3_12
6. Kuhl, U., Bush, A.: When Bias Backfires: The Modulatory Role of Counterfactual Explanations on the Adoption of Algorithmic Bias in XAI-Supported Human Decision-Making. arXiv preprint arXiv:2505.14377 (2025)
  https://doi.org/10.1007/978-3-032-08333-3_12
7. Burke, R.J.: Correlates of Perceived Bias in a Professional Services Firm. Int. J. of Career Management 7(1), 5-11 (1995) https://doi.org/10.1108/09556219510079597
Literaturverzeichnis
1. Burke, R.J.: Correlates of Perceived Bias in a Professional Services Firm. Int. J. of Career Management 7(1), 5-11 (1995). https://doi.org/10.1108/09556219510079597
2. Byrne, R.M.: Counterfactuals in Explainable Artificial Intelligence (XAI): Evidence from Human Reasoning. In: Proc. of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019. pp. 6276-6282. California, CA (2019). https://doi.org/10.24963/ijcai.2019/876
3. Kuhl, U., Artelt, A., Hammer, B.: Keep Your Friends Close and Your Counterfactuals Closer: Improved Learning from Closest Rather Than Plausible Counterfactual Explanations in an Abstract Setting. In: Proc. of the 2022 ACM Conference on Fairness, Accountability, and Transparency. pp. 2125-2137 (2022).
  https://doi.org/10.1145/3531146.3534630
4. Kuhl, U., Artelt, A., Hammer, B.: For Better or Worse: The Impact of Counterfactual Explanations' Directionality on User Behavior in XAI. In: Proc. of the World Conference on Explainable Artificial Intelligence. pp. 280-300. Springer (2023). https://doi.org/10.1007/978-3-031-44070-0_14
5. Kuhl, U., Artelt, A., Hammer, B.: Let's Go to the Alien Zoo: Introducing an Experimental Framework to Study Usability of Counterfactual Explanations for Machine Learning. Frontiers in Computer Science 5, 1087929 (2023).
  https://doi.org/10.3389/fcomp.2023.1087929
6. Kuhl, U., Bush, A.: When Bias Backfires: The Modulatory Role of Counterfactual Explanations on the Adoption of Algorithmic Bias in XAI-Supported Human Decision-Making. arXiv preprint arXiv:2505.14377 (2025).
  https://doi.org/10.1007/978-3-032-08333-3_12
7. Lakkaraju, H., Bastani, O.: "How Do I Fool You?" Manipulating User Trust via Misleading Black Box Explanations. In: Proc. of the AAAI/ACM Conference on AI, Ethics, and Society. pp. 79-85 (2020).
  https://doi.org/10.1145/3375627.3375833
8. Reeder, S., Jensen, J., Ball, R.: Evaluating Explainable AI (XAI) in Terms of User Gender and Educational Background. In: Int. Conf. on Human-Computer Interaction. pp. 286-304. Springer (2023).
  https://doi.org/10.1007/978-3-031-35891-3_18
9. Wang, C., Li, X.H., Han, H., Wang, S., Wang, L., Cao, C.C., Chen, L.: Counterfactual Explanations in Explainable AI: A Tutorial. In: Proc. of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. pp. 4080-4081 (2021). https://doi.org/10.1145/3447548.3470797
10. Warren, G., Byrne, R.M., Keane, M.T.: Categorical and Continuous Features in Counterfactual Explanations of AI Systems. ACM Transactions on Interactive Intelligent Systems 14(4), 1-37 (2024). https://doi.org/10.1145/3673907
11. Yadav, B.R.: The Ethics of Understanding: Exploring Moral Implications of Explainable AI. Int. J. of Science and Research (IJSR) 13(6), 1-7 (2024). https://doi.org/10.21275/SR24529122811

Bibliografische Daten

Erscheinungsdatum	5. März 2026
DOI	10.64136/ofzh7798
Creative Commons Lizenz

Acknowledgments

This research was supported by the research training group Dataninja (Trustworthy AI for Seamless Problem Solving: Next Generation Intelligence Joins Robust Data Analysis) funded by the German federal state of North Rhine-Westphalia, and project KI-Akademie OWL, financed by the Federal Ministry of Education and Research Germany (BMBF) and supported by the Project Management Agency of the German Aerospace Centre (DLR) under grant no. 01|S24057A.
Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article. The Ethics Committee of Bielefeld University, Germany, approved this study

Contextualizing Counterfactuals: Gender Differences in Alignment with Biased (X)AI

Nächstes Kapitel

14 Investigating the Impact of Conceptual Metaphors on LLM-based NLI through Shapley Interactions