Fußnote
Referenz
Leandra Fichtel, Maximilian Spliethöver, Eyke Hüllermeier, Patricia Jimenenz, Nils Klowait, Stefan Kopp, Axel-Cyrille Ngonga Ngomo, Amelie Robrecht, Ingrid Scharlau, Lutz Terfloth, Anna-Lisa Vollmer, Henning Wachsmuth
Contextualizing Explanations

Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues

The computational generation of natural language explanations has gained research interest due to its importance for explainable artificial intelligence (XAI), which aims to explain decisions made by AI systems.  Recent XAI research focuses on personalized explanations tailored to the explainee to ensure a more effective communication , arguing that it is important to account for the diverse backgrounds and individual abilities of explainees to achieve understanding.  In addition, understanding evolves dynamically in social interactions, like dialogues, between explainers and explainees. Therefore, effective explanations must not only involve an initial personalization but also continuously adapt to the explainee’s current understanding and needs throughout the interaction.

This dynamic adaption can be achieved in a co-constructive explanation dialogue where the explainer continuously monitors the explainee’s understanding and scaffolds the explanations accordingly.  Monitoring involves, for example, the use of diagnostic queries and verification questions to continuously assess the explainee’s understanding . Scaffolding guides the explainee through the explanation by adjusting the level of assistance. The question arises as to how to enable an XAI system to lead such co-constructive explanation dialogues.

Large language models (LLMs) have demonstrated remarkable abilities in generating coherent and contextually relevant texts.  Fine-tuning LLMs to follow instructions  has further enabled LLMs to adjust their behavior to complex prompts  and support users to construct knowledge.  However, whether these capabilities also enable co-constructive explanation dialogues that involve active monitoring and scaffolding from the explainer remains unclear. In this work, we study co-constructive explanation dialogues with state-of-the-art LLMs, focusing on the following research questions: (1) How to model a coconstructive explanation dialogue using “out-of-the-box” LLMs? (2) To what extent do LLMs show co-constructive behavior? And (3) How effectively do LLMs guide explainees toward a better understanding of a given topic?

We investigate to what extent state-of-the-art LLMs exhibit co-constructive behavior in unimodal explanation dialogues with human explainees. In particular, we conducted a user study in which participants interacted with a text-based LLM (Llama 3.1 70B) in English to receive explanations on a predefined topic. To increase the generalizability and reliability of our results, we examine three diverse topics: (1) The board game Quarto and its rules, (2) The formation of black holes, and (3) The human sleep cycle and its stages. We examine two “zero-shot” settings (without LLM fine-tuning or providing examples) based on different system prompts: a baseline setting and an enhanced setting. In the baseline setting, the LLM is simply instructed to act as an explainer. In the enhanced setting, the LLM is given detailed instructions on how to follow co-constructive behavior and apply monitoring and scaffolding. Therefore, monitoring and scaffolding are supposed to be achieved implicitly by the LLM through the provided system prompt. Before and after interacting with the LLMs, we assess the participants’ understanding and their perception of the LLM’s co-constructive behavior using several questionnaires adapted from previous works.  To assess understanding, we focus on the participants’ comprehension of the topic and their ability to perform actions in the domain of the topic (enabledness).  We further distinguish between subjective and objective comprehension. Subjective comprehension involves the participants self-assessing their comprehension of the topic, while objective comprehension measures their comprehension verifiably through factual questions. 

A total of 277 participants completed the study. We analyze the dialogues and questionnaires both quantitatively and qualitatively, focusing on the participants’ understanding as well as on the co-constructive abilities of the LLMs. Our results indicate that, on average, both LLM settings improve the self-assessed subjective comprehension significantly and result in equally well performances in the post-interaction objective comprehension and enabledness questionnaires. A closer examination of the participants’ performance in the objective comprehension questionnaire shows an approximately normal distribution in the baseline setting. In contrast, the performance of the participants who interacted with the enhanced LLM is more skewed: fewer participants performed around the average, while some performed much better and others worse. This suggests that the enhanced prompt can have a positive effect on explanation success by enabling co-constructive behaviors of the LLM, such as assessing prior knowledge, asking verification questions, and encouraging participants to also adopt the role of the explainer. However, the inconsistent use of effective scaffolding techniques may have led to the higher proportion of participants performing below average in the enhanced setting. Lastly, while the co-construction of knowledge and understanding in a unimodal setting is generally possible, it requires more work from the interaction partners to account for the missing parallel interactions that are possible in multimodal settings.  In our qualitative analysis, we find that the LLM must actively demand this extra effort from the explainee, adding to their workload and making the dialogue more cumbersome and less effective.

In conclusion, our results suggest that the LLM shows limited capabilities to act as a co-constructive explainer. While the LLM can simulate monitoring to some degree, the scaffolding of explanations seemed rather inconsistent. Furthermore, a unimodal setup introduces additional friction.

Presentation Co-Constructive Behavior of Large Language Models in Explanation Dialogues held at the 3rd TRR 318 Con­fe­rence: Con­tex­tu­a­li­zing Ex­pla­na­ti­ons on 17th of June 2025 in Bielefeld, Germany

Nächstes Kapitel