Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues

Leandra Fichtel, Maximilian Spliethöver, Eyke Hüllermeier, Patricia Jimenenz, Nils Klowait, Stefan Kopp, Axel-Cyrille Ngonga Ngomo, Amelie Robrecht, Ingrid Scharlau, Lutz Terfloth, Anna-Lisa Vollmer und Henning Wachsmuth

The computational generation of natural language explanations has gained research interest due to its importance for explainable artificial intelligence (XAI), which aims to explain decisions made by AI systems. Recent XAI research focuses on personalized explanations tailored to the explainee to ensure a more effective communication , arguing that it is important to account for the diverse backgrounds and individual abilities of explainees to achieve understanding. In addition, understanding evolves dynamically in social interactions, like dialogues, between explainers and explainees. Therefore, effective explanations must not only involve an initial personalization but also continuously adapt to the explainee’s current understanding and needs throughout the interaction.

This dynamic adaption can be achieved in a co-constructive explanation dialogue where the explainer continuously monitors the explainee’s understanding and scaffolds the explanations accordingly. Monitoring involves, for example, the use of diagnostic queries and verification questions to continuously assess the explainee’s understanding . Scaffolding guides the explainee through the explanation by adjusting the level of assistance. The question arises as to how to enable an XAI system to lead such co-constructive explanation dialogues.

Large language models (LLMs) have demonstrated remarkable abilities in generating coherent and contextually relevant texts. Fine-tuning LLMs to follow instructions has further enabled LLMs to adjust their behavior to complex prompts and support users to construct knowledge. However, whether these capabilities also enable co-constructive explanation dialogues that involve active monitoring and scaffolding from the explainer remains unclear. In this work, we study co-constructive explanation dialogues with state-of-the-art LLMs, focusing on the following research questions: (1) How to model a coconstructive explanation dialogue using “out-of-the-box” LLMs? (2) To what extent do LLMs show co-constructive behavior? And (3) How effectively do LLMs guide explainees toward a better understanding of a given topic?

We investigate to what extent state-of-the-art LLMs exhibit co-constructive behavior in unimodal explanation dialogues with human explainees. In particular, we conducted a user study in which participants interacted with a text-based LLM (Llama 3.1 70B) in English to receive explanations on a predefined topic. To increase the generalizability and reliability of our results, we examine three diverse topics: (1) The board game Quarto and its rules, (2) The formation of black holes, and (3) The human sleep cycle and its stages. We examine two “zero-shot” settings (without LLM fine-tuning or providing examples) based on different system prompts: a baseline setting and an enhanced setting. In the baseline setting, the LLM is simply instructed to act as an explainer. In the enhanced setting, the LLM is given detailed instructions on how to follow co-constructive behavior and apply monitoring and scaffolding. Therefore, monitoring and scaffolding are supposed to be achieved implicitly by the LLM through the provided system prompt. Before and after interacting with the LLMs, we assess the participants’ understanding and their perception of the LLM’s co-constructive behavior using several questionnaires adapted from previous works. To assess understanding, we focus on the participants’ comprehension of the topic and their ability to perform actions in the domain of the topic (enabledness). We further distinguish between subjective and objective comprehension. Subjective comprehension involves the participants self-assessing their comprehension of the topic, while objective comprehension measures their comprehension verifiably through factual questions.

A total of 277 participants completed the study. We analyze the dialogues and questionnaires both quantitatively and qualitatively, focusing on the participants’ understanding as well as on the co-constructive abilities of the LLMs. Our results indicate that, on average, both LLM settings improve the self-assessed subjective comprehension significantly and result in equally well performances in the post-interaction objective comprehension and enabledness questionnaires. A closer examination of the participants’ performance in the objective comprehension questionnaire shows an approximately normal distribution in the baseline setting. In contrast, the performance of the participants who interacted with the enhanced LLM is more skewed: fewer participants performed around the average, while some performed much better and others worse. This suggests that the enhanced prompt can have a positive effect on explanation success by enabling co-constructive behaviors of the LLM, such as assessing prior knowledge, asking verification questions, and encouraging participants to also adopt the role of the explainer. However, the inconsistent use of effective scaffolding techniques may have led to the higher proportion of participants performing below average in the enhanced setting. Lastly, while the co-construction of knowledge and understanding in a unimodal setting is generally possible, it requires more work from the interaction partners to account for the missing parallel interactions that are possible in multimodal settings. In our qualitative analysis, we find that the LLM must actively demand this extra effort from the explainee, adding to their workload and making the dialogue more cumbersome and less effective.

In conclusion, our results suggest that the LLM shows limited capabilities to act as a co-constructive explainer. While the LLM can simulate monitoring to some degree, the scaffolding of explanations seemed rather inconsistent. Furthermore, a unimodal setup introduces additional friction.

Presentation Co-Constructive Behavior of Large Language Models in Explanation Dialogues held at the 3rd TRR 318 Conference: Contextualizing Explanations on 17th of June 2025 in Bielefeld, Germany

Nächstes Kapitel

5 Framing what and how to think: Lay people’s metaphors for algorithms

An informed public discussion about explainable artificial intelligence requires that laypeople and experts can negotiate in a language accessible to both. In our paper, we argue that this requires attention to metaphors. Metaphors – roughly speaking linguistic images – are necessary in such discussions to make abstract concepts comprehensible. We also argue that the typical metaphorical “vocabulary” is relatively narrow and sometimes has problematic implications. We conclude with some…

Schriftgröße

Klein

Mittel

Groß

Hintergrund

% Lesefortschritt

Inhaltsverzeichnis
Contextualizing Explanations
Fußnoten
1. Schneider, J., Handali, J.P.: Personalized explanation for machine learning: A conceptualization. In: Proc. of the 27th European Conference on Information Systems (ECIS). Stockholm & Uppsala, Sweden (June 2019)
2. Sokol, K., Flach, P.: One explanation does not fit all. KI - Künstliche Intelligenz 34(2), 235–250 (Jun 2020). https://doi.org/10.1007/s13218-020-00637-y
3. Wachsmuth, H., Alshomary, M.: “Mama always had a way of explaining things so I could understand”: A dialogue corpus for learning to construct explanations. In: Proc. of the 29th Int. Conf. on Computational Linguistics. pp. 344–354. International Committee on Computational Linguistics, Gyeongju, Republic of Korea (2022), https://aclanthology.org/2022.coling-1.27
4. El-Assady, M., Jentner, W., Kehlbeck, R., Schlegel, U., Sevastjanova, R., Sperrle, F., Spinner, T., Keim, D.: Towards XAI: Structuring the processes of explanations. In: ACM Workshop on Human-Centered Machine Learning (05 2019);
  K. J. Rohlfing et al., "Explanation as a Social Practice: Toward a Conceptual Framework for the Social Design of AI Systems," in IEEE Transactions on Cognitive and Developmental Systems, vol. 13, no. 3, pp. 717-728, Sept. 2021. https://doi.org/10.1109/TCDS.2020.3044366
5. El-Assady, M. et al.: Towards XAI: Structuring the processes of explanations. In: ACM Workshop on Human-Centered Machine Learning (05 2019)
6. Dubey, A. et al.: The Llama 3 Herd of Models (Jul 2024). https://doi.org/10.48550/arXiv.2407.21783
7. Ouyang, L. et al.: Training language models to follow instructions with human feedback. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems. vol. 35, pp. 27730–27744. Curran Associates, Inc. (2022);
  Wang, Y. et al.: Self-Instruct: Aligning language models with self-generated instructions. In: Proc. of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 13484–13508. Association for Computational Linguistics, Toronto, Canada (Jul 2023). https://doi.org/10.18653/v1/2023.acl-long.754
8. Spliethöver, M., et al.: Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection. In: Proc. of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) pp. 2421–2449. Association for Computational Linguistics, Albuquerque, New Mexico (Apr 2025), https://aclanthology.org/2025.naacl-long.122/. https://doi.org/10.18653/v1/2025.naacl-long.122
9. Cress, U., Kimmerle, J.: Co-constructing knowledge with generative AI tools: Reflections from a CSCL perspective. International Journal of Computer-Supported Collaborative Learning 18(4), 607–614 (Dec 2023). https://doi.org/10.1007/s11412-023-09409-w
10. Buhl, H.M., Fisher, J.B., Rohlfing, K.J.: Role Perception Questionnaire: Coconstruction. Scales Manual (2025). https://osf.io/bnem2/files/osfstorage/68677f0b45587b0093f71a0f; https://doi.org/10.17605/OSF.IO/BNEM2;
  Buhl, H.M., Herrmann, P.: Partner model-scales (2025), in preparation. Project A01, TRR 318;
  Rheinberg, F., Vollmeyer, R., Burns, B.: FAM: Ein Fragebogen zur Erfassung aktuller Motivation in Lern- und Leistungssituationen. Diagnostica 47, 57–66 (04 2001). https://doi.org/10.1026//0012-1924.47.2.57;
  Roscoe, R.D.: Self-monitoring and knowledge-building in learning by teaching. Instructional Science 42(3), 327–351 (May 2014). https://doi.org/10.1007/s11251- 013-9283-4;
  Roscoe, R.D., Chi, M.T.H.: Tutor learning: the role of explaining and responding to questions. Instructional Science 36(4), 321–350 (Jul 2008). https://doi.org/10.1007/s11251-007-9034-5;
  Schiefele, U., Schaffner, E.: Factorial and construct validity of a new instrument for the assessment of reading motivation. Reading Research Quarterly 51(2), 221–237 (2016). https://doi.org/10.1002/rrq.134;
  Staub, F.C., Stern, E.: The nature of teachers’ pedagogical content beliefs matters for students’ achievement gains: Quasi-experimental evidence from elementary mathematics. Journal of Educational Psychology 94(2), 344–355 (2002). https://doi.org/10.1037/0022-0663.94.2.344;
  Strecker, J., Noack, P.: Wichtigkeit und Nützlichkeit von Mathematik aus Schülersicht. In: Prenzel, M., Doll, J. (eds.) Bildungsqualität von Schule: Schulische und außerschulische Bedingungen mathematischer, naturwissenschaftlicher und überfachlicher Kompetenzen, Zeitschrift für Pädagogik, Beiheft, vol. 45, pp. 359–372. Beltz, Weinheim (2002);
  Terfloth, L., Schaffer, M.E.: Quarto Understanding (Mar 2025). https://doi.org/10.17605/OSF.IO/W39DC;
  Trigwell, K., Prosser, M.: Development and use of the approaches to teaching inventory. Educational Psychology Review 16(4), 409–424 (Dec 2004). https://doi.org/10.1007/s10648-004-0007-9
11. Buschmeier, H. et al.: Forms of understanding of XAI-Explanations (2023), https://arxiv.org/abs/2311.08760
12. Buschmeier, H. et al.: Forms of understanding of XAI-Explanations (2023), https://arxiv.org/abs/2311.08760
13. Rohlfing, K.J., et al.: Multimodal Turn-Taking: Motivations, Methodological Challenges, and Novel Approaches. IEEE Transactions on Cognitive and Developmental Systems 12(2), 260–271 (Jun 2020). https://doi.org/10.1109/TCDS.2019.2892991;
  Goodwin, C.: Why Multimodality? Why Co-Operative Action? (transcribed by J. Philipsen). Social Interaction. Video-Based Studies of Human Sociality 1(2) (Oct 2018). https://doi.org/10.7146/si.v1i2.110039
Literaturverzeichnis
1. Buhl, H.M., Fisher, J.B., Rohlfing, K.J.: Role Perception Questionnaire: Coconstruction. Scales Manual (2025). https://doi.org/10.17605/OSF.IO/BNEM2, https://osf.io/bnem2/files/osfstorage/68677f0b45587b0093f71a0f.
2. Buhl, H.M., Herrmann, P.: Partner model-scales (2025), in preparation. Project A01, TRR 318
3. Buschmeier, H., Buhl, H.M., Kern, F., Grimminger, A., Beierling, H., Fisher, J., Groß, A., Horwath, I., Klowait, N., Lazarov, S., Lenke, M., Lohmer, V., Rohlfing, K., Scharlau, I., Singh, A., Terfloth, L., Vollmer, A.L., Wang, Y., Wilmes, A., Wrede, B.: Forms of understanding of XAI-Explanations (2023). https://arxiv.org/abs/2311.08760.
4. Cress, U., Kimmerle, J.: Co-constructing knowledge with generative AI tools: Reflections from a CSCL perspective. International Journal of Computer-Supported Collaborative Learning 18(4), 607–614 (Dec 2023). https://doi.org/10.1007/s11412-023-09409-w.
5. Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., Korenev, A., Hinsvark, A., Rao, A., Zhang, A., Rodriguez, A., Gregerson, A.: The Llama 3 Herd of Models (Jul 2024). https://doi.org/10.48550/arXiv.2407.21783.
6. El-Assady, M., Jentner, W., Kehlbeck, R., Schlegel, U., Sevastjanova, R., Sperrle, F., Spinner, T., Keim, D.: Towards XAI: Structuring the processes of explanations. In: ACM Workshop on Human-Centered Machine Learning (05 2019)
7. Goodwin, C.: Why Multimodality? Why Co-Operative Action? (transcribed by J. Philipsen). Social Interaction. Video-Based Studies of Human Sociality 1(2) (Oct 2018). https://doi.org/10.7146/si.v1i2.110039.
8. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P.F., Leike, J., Lowe, R.: Training language models to follow instructions with human feedback. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems vol. 35, pp. 27730–27744. Curran Associates, Inc. (2022)
9. Rheinberg, F., Vollmeyer, R., Burns, B.: FAM: Ein Fragebogen zur Erfassung aktueller Motivation in Lern- und Leistungssituationen, Diagnostica 47, 57–66 (04 2001). https://doi.org/10.1026//0012-1924.47.2.57.
10. Rohlfing, K.J., Cimiano, P., Scharlau, I., Matzner, T., Buhl, H.M., Buschmeier, H., Esposito, E., Grimminger, A., Hammer, B., Häb-Umbach, R., Horwath, I., Hüllermeier, E., Kern, F., Kopp, S., Thommes, K., Ngonga Ngomo, A.C., Schulte, C., Wachsmuth, H., Wagner, P., Wrede, B.: Explanation as a social practice: Toward a conceptual framework for the social design of AI systems. IEEE Transactions on Cognitive and Developmental Systems 13(3), 717–728 (2021). https://doi.org/10.1109/TCDS.2020.3044366.
11. Rohlfing, K.J., Leonardi, G., Nomikou, I., Rączaszek-Leonardi, J., Hüllermeier, E.: Multimodal Turn-Taking: Motivations, Methodological Challenges, and Novel Approaches. IEEE Transactions on Cognitive and Developmental Systems 12(2), 260–271 (Jun 2020). https://doi.org/10.1109/TCDS.2019.2892991.
12. Roscoe, R.D.: Self-monitoring and knowledge-building in learning by teaching. Instructional Science 42(3), 327–351 (May 2014). https://doi.org/10.1007/s11251- 013-9283-4.
13. Roscoe, R.D., Chi, M.T.H.: Tutor learning: the role of explaining and responding to questions. Instructional Science 36(4), 321–350 (Jul 2008). https://doi.org/10.1007/s11251-007-9034-5.
14. Schiefele, U., Schaffner, E.: Factorial and construct validity of a new instrument for the assessment of reading motivation. Reading Research Quarterly 51(2), 221–237 (2016). https://doi.org/10.1002/rrq.134.
15. Schneider, J., Handali, J.P.: Personalized explanation for machine learning: A conceptualization. In: Proc. of the 27th European Conference on Information Systems (ECIS). Stockholm & Uppsala, Sweden (June 2019)
16. Sokol, K., Flach, P.: One explanation does not fit all. KI - Künstliche Intelligenz 34(2), 235–250 (Jun 2020). https://doi.org/10.1007/s13218-020-00637-y.
17. Spliethöver, M., Knebler, T., Fumagalli, F., Muschalik, M., Hammer, B., Hüllermeier, E., Wachsmuth, H.: Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection. In: Proc. of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). pp. 2421–2449. Association for Computational Linguistics, Albuquerque, New Mexico (Apr 2025), https://aclanthology.org/2025.naacl-long.122/. https://doi.org/10.18653/v1/2025.naacl-long.122.
18. Staub, F.C., Stern, E.: The nature of teachers’ pedagogical content beliefs matters for students’ achievement gains: Quasi-experimental evidence from elementary mathematics. Journal of Educational Psychology 94(2), 344–355 (2002). https://doi.org/10.1037/0022-0663.94.2.344.
19. Strecker, J., Noack, P.: Wichtigkeit und Nützlichkeit von Mathematik aus Schülersicht. In: Prenzel, M., Doll, J. (eds.) Bildungsqualität von Schule: Schulische und außerschulische Bedingungen mathematischer, naturwissenschaftlicher und überfachlicher Kompetenzen, Zeitschrift für Pädagogik, Beiheft, vol. 45, pp. 359–372. Beltz, Weinheim (2002)
20. Terfloth, L., Schaffer, M.E.: Quarto Understanding (Mar 2025). https://doi.org/10.17605/OSF.IO/W39DC.
21. Trigwell, K., Prosser, M.: Development and use of the approaches to teaching inventory. Educational Psychology Review 16(4), 409–424 (Dec 2004). https://doi.org/10.1007/s10648-004-0007-9.
22. Wachsmuth, H., Alshomary, M.: “Mama always had a way of explaining things so I could understand”: A dialogue corpus for learning to construct explanations. In: Proc. of the 29th Int. Conf. on Computational Linguistics pp. 344–354. International Committee on Computational Linguistics, Gyeongju, Republic of Korea (Oct 2022), https://aclanthology.org/2022.coling-1.27.
23. Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N.A., Khashabi, D., Hajishirzi, H.: Self-Instruct: Aligning language models with self-generated instructions. In: Proc. of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 13484–13508. Association for Computational Linguistics, Toronto, Canada (Jul 2023). https://doi.org/10.18653/v1/2023.acl-long.754.

Bibliografische Daten

Erscheinungsdatum	5. März 2026
DOI	10.64136/safy2489
Creative Commons Lizenz

Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues

Nächstes Kapitel

5 Framing what and how to think: Lay people’s metaphors for algorithms