Context-dependent Effects of Explanations on Multiple Layers of Trust
The rise of highly performant deep learning approaches results in a growing number of possible applications in many domains. In most fields, such as medicine, human agency and oversight are crucial, that is, complex decisionmaking processes should be performed by human-AI teams. Since machine learned models are typically operating as opaque systems, they need to be complemented with methods that allow humans to evaluate the reliance of such models. This requirement has inspired research on explainable AI (XAI) with a growing number of different methods. Increasing the transparency of systems has been considered as important means for users to calibrate their trust into AI-systems, i.e., avoiding undertrust to ensure that the system is utilized effectively while also preventing overtrust to avoid blind reliance and to recognize potential errors.
However, recent research suggests that transparency means do not guarantee trust calibration or better performance. The use of explanations can indeed improve users’ confidence in the results of a system, reveal hidden biases and help to improve the model, but it can also increase the users’ cognitive workload or even undermine their trust. Transparency often falls short of interpretability: exposing a system’s inner workings does not ensure that this information is meaningful and comprehensible to humans. Moreover, explanations can differ in their fidelity, i.e., can be more or less consistent with the explained outcome of the system. Especially the widely used post hoc explanation methods lack a ground truth for the ’real’ explanation and can be considered black boxes themselves, which also demand trust. However, most non-expert users might not become aware of this unless they are presented with multiple inconsistent explanations.
In addition, approaches to evaluate these explanations are also subject to uncertainties. Humans judging the quality of explanations are inherently subjective, often biased and tend to favor simplified explanations. Computational fidelity metrics have also been shown to exhibit inconsistencies and lack precision when applied to non-linear models. Thus, the evaluation of explanations yields results that also must be trusted.
We argue that every intervention or technique that aims to improve transparency or human oversight is possibly associated with uncertainty and, therefore, adds another layer of required trust to the trust in the system’s outcomes: trust in the explanations and trust in the fidelity metrics used to evaluate explanations. Depending on the features of a system, other layers can be relevant such as the trust in the appropriate corrigibility, i.e., the question whether corrections are integrated in the intended way or whether the system can be corrupted by false or manipulated corrections, or the trust in the correct adaptation of a learning system to the user’s preferences, and so forth. Every layer can shift the focus to other aspects than the system’s outcome, and inconsistencies between these layers can lead to a weighing of credibility and influence the users’ perception of the system. In our view, whether these layers are apparent to the human and become relevant for the perceived trustworthiness of a system, highly depends on contextual factors such as the expertise of the trustor (AI experts, domain experts and non-experts), the presentation of the information and the type, severity and importance of the joint task.
From an ethical perspective, the sustainability of trust in the system is important. Cognitive dissonance may occur if users think that the explanations nudge them into taking decisions rather than owning reflected decisions in human-AI teams. Such impressions, however, will not be reflected in behavioral data. If a decision affects other people, cognitive dissonance may even turn into ethical dissonance. This is the case when users perceive a gap between their internalized moral standard of making self-determined decisions and the experienced reality of having merely nodded off the system’s explanation. The ethical distinction between manipulative and persuasive AI is instructive. The literature on manipulation that often focuses on the intentions of the manipulator is not easily transferrable to the realm of human-AI interaction. The explanations of manipulative AI may well induce as much trust as those of persuasive AI in the short run, but the long-term effects of both might be different. Therefore, it seems important to complement behavioral measures that investigate reliance in a system’s output and explanations by eliciting users’ self-reflections on their decision-making autonomy.
The relationship between explanations, trust, reliance and human-AI team performance remains complex and requires further research. Open questions persist around evaluating XAI methods with regard to their fidelity and their impact on trust and performance. This underscores the need for controlled empirical studies with different user groups considering their individual information needs and the layers of trust that are associated with them.
Explanations and other transparency means should be presented in a way that makes them as beneficial as possible for the user. Since they can introduce uncertainties, increase the mental load, or induce cognitive dissonance, they should be used and implemented carefully so that the benefits outweigh these costs. What is beneficial for the user is, to a substantial degree, subjective and may depend on the context and the users’ perceptions of their role in the humanAI team under specific XAI methods.
Presentation Context-dependent Effects of Explanations on Multiple Layers of Trust held at the 3rd TRR 318 Conference: Contextualizing Explanations on 17th of June 2025 in Bielefeld, Germany