Fußnote
Referenz
Ulrike Kuhl, Annika Bush
Contextualizing Explanations

Contextualizing Counterfactuals: Gender Differences in Alignment with Biased (X)AI

Introduction. Counterfactual explanations (CEs) in explainable AI (XAI) illustrate how alternative model inputs lead to a change in outcomes, offering actionable insight by mirroring human reasoning.  However, XAI’s impact on user behavior is multifaceted and may inadvertently promote over-reliance, legitimize biased outputs, or foster undue trust in inherently untrustworthy blackbox systems.  For instance, prior work shows that CEs must be carefully calibrate.d in terms of feature types and directionality.  There is preliminary evidence that gender and educational background modulate user responses to XAI systems.  In a previous study on the influence of CEs on decision making, we showed that CEs could lead to a reverse effect on the alignment with XAI recommendations,  although users hardly reported to recognize an induced gender bias in AI recommendations. However, no study has yet examined how user’s individual factors like gender interact with CEs in biased decision scenarios - particularly when individuals face biases they may already know from real-world experience.

Methods. We re-analysed data from a simulated hiring study  to examine whether a) participant’s gender identity influences their alignment with biased AI recommendations and b) whether these differences extend to bias shifts after exposure to biased (X)AI recommendations. 293 participants (147 female) took on the role of hiring managers, repeatedly selecting between two candidate profiles. During an interaction phase, they were exposed to AI recommendations with CEs (XAI) or without (blackbox AI). The recommendations were either male or female biased. We analyzed the proportion of (X)AI-aligned decisions as well as potential bias shifts in participants’ behavior (difference between pre- and post-(X)AI-interaction phases). Separate analyses were performed on data from participants identifying as female and male, respectively. All reported results are Bonferroni-corrected to account for multiple comparisons.

Two-part chart comparing female and male participant data on AI-aligned decisions and participant-bias shift under AI and XAI conditions, with significance markers.

Fig. 1. a) Mean proportion of (X)AI-aligned decisions by gender and condition. b) Mean bias shift in participant behavior from pre- to post-(X)AI interaction, stratified by condition and gender. Whiskers represent the standard error of the mean.

Results. For female participants, CEs significantly increased AI-alignment (F(1,98)=6.935, pcorr=.020, Fig. 1a). Further, a significant interaction effect on bias shift was observed for female participants (F(1,59)=10.303, pcorr=.004,Fig. 1b), indicating that exposure to CEs induced a reversal effect, i.e. a shift in decision patterns in the direction opposite to the AI bias. In contrast, male participants showed no significant differences in either AI alignment or bias shift.

Discussion and Conclusion. Our findings suggest that CE-XAI does not affect all users uniformly. Female participants aligned with AI recommendations more when CEs were present, possibly reflecting greater sensitivity to the potential of bias based on lived experience,  while male participants followed the AI recommendations regardless of explanation. This highlights the need to contextualize explanations, as personal user characteristics, such as gender, can shape their effectiveness. Personalized or adaptive explanation systems may be needed to prevent over-reliance or the inadvertent perpetuation of biases. Further, the reversal effect observed only in female participants raises important questions about how explanations interact with prior experiences of bias. CEs may have heightened a subjective awareness of unfairness, prompting a corrective response, yet this interpretation remains speculative and warrants careful re-evaluation in future research.

We show that XAI affects users differently. Future work should focus on explanation strategies that take user characteristics into account, enabling objective and fair decision-making in high-stakes contexts.

Presentation Contextualizing Counterfactuals: Gender Differences in Alignment with Biased (X)AI held at the 3rd TRR 318 Con­fe­rence: Con­tex­tu­a­li­zing Ex­pla­na­ti­ons on 17th of June 2025 in Bielefeld, Germany

Nächstes Kapitel