Investigating the Impact of Conceptual Metaphors on LLM-based NLI through Shapley Interactions

Meghdut Sengupta, Maximilian Muschalik, Fabian Fumagalli, Barbara Hammer, Eyke Hüllermeier, Debanjan Ghosh und Henning Wachsmuth

In everyday life, metaphorical language is frequently used in the flow of conversations.While metaphors can be observed in explicit cases such as “she was the light of my life”, the meaning manifestation of conventionalized metaphors such as “tax the rich” is more fundamentally grounded in language.

Introduciton

According to Lakoff and Johnson , a metaphorical meaning construction is the product of a meaning mapping that connects one concept domain to another: A metaphor is taken from a source domain to explain a target domain. In the sentence “Gun addicts increasingly realize that society is rejecting them”, the source domain is addiction and the target domain is guns.

Existing approaches on interpreting metaphorical language follow a cognitive decoding of conceptual metaphors, integrating source and target domain information into downstream tasks. Shutoval et al. used hierarchical clustering and conceptual metaphors for metaphor identification and interpretation, whereas Stowe et al. paraphrased metaphors into literal counterparts by extracting source and target domain information from FrameNet. Recently, Sengupta et al. proposed a multitask approach to jointly predicting source domains and highlighted aspects.

Unlike previous work, we explore metaphor interpretation in implicit metaphor usage, that is, beyond their explicit identification. Chakrabarty et al. introduced an NLI dataset, FLUTE, in which many hypotheses are metaphorical. They benchmarked transformer-based models, highlighting the challenge of implicitly figurative downstream tasks. For metaphors, we argue that it is crucial to examine their context and conceptual metaphor interactions. We address this gap in NLP by analyzing LLM performance using a Shapley-based analysis.

Despite progress in computational metaphor interpretation, research shows that large language models (LLMs) underperform on downstream tasks that require correct interpretation of figurative language such as metaphors. Skrynnikova highlights that LLMs imitate data rather than that reasoning analogically, a key requirement for successful metaphor comprehension. Similarly, Comşa et al. attribute this struggle to dependency on contextual variables. We address this research gap, and inspired by advances of Shapley-based analyses in explainable AI, we study LLM interpretability in metaphorical tasks with two research questions:

(1) How well do LLMs handle implicit metaphorical language in downstream tasks such as natural language inference (NLI)? (2) Does providing source and target domains enhance NLI performance, and how does this information interact with metaphorical texts?

We evaluate five LLMs, in an NLI setup for given pairs of premise and metaphorical hypothesis. To that end, the contribution of this work is two-fold: (1) We extend the metaphorical samples in the flute dataset by annotations of source and target domains. (2) We investigate the impact of the domains on LLMs in the task of NLI with an analysis of Shapley values and Shapley interactions.

Takeaway

Research in metaphor interpretation has explored various approaches. This paper has explored how conceptual metaphors (source and target domains) influence LLM performance in NLI on texts with metaphorical language that is implicitly embedded in the meaning manifestation. To that end, we have first extended the flute dataset with source and target domains. Subsequently, our ablation study using zero-shot and few-shot prompts for two LLMs has showed the best results when explanations are combined with conceptual metaphors. A Shapley-based analysis confirms their positive impact, consistently improving model performance. Our findings suggest that incorporating information about the domains in a mere inferential setup with zero-shot and few-shot learning contributes to improved performance for the LLMs in 70% of the experiments. Our findings lay the ground for future work: Advanced techniques that leverage metaphorical knowledge likely will improve the understanding of implicit metaphorical language by LLMs.

Presentation Conceptual Metaphors on LLM-based NLI through Shapley Interactions held at the 3rd TRR 318 Conference: Contextualizing Explanations on 17th of June 2025 in Bielefeld, Germany

Nächstes Kapitel

15 Towards Symbolic XAI – Explanation Through Human Understandable Logical Relationships Between Features

Machine learning (ML) models are increasingly used in science, industry, and for solving everyday problems. However, their predictions — particularly those of neural networks — are generally not traceable by the user. The field of explainable artificial intelligence (XAI) aims to address this issue by providing additional evidence for why a model made a particular prediction.

Schriftgröße

Klein

Mittel

Groß

Hintergrund

% Lesefortschritt

Inhaltsverzeichnis
Contextualizing Explanations
Fußnoten
1. Kövecses, Z.: Metaphor, language, and culture. DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada 26, 739-757 (2010) https://doi.org/10.1590/S0102-44502010000300017;
  Gábor, T.: Metaphors in Everyday English. College of Nyiregyhaza (2014). https://doi.org/10.31219/osf.io/zbq8v
2. Lakoff, G., Johnson, M.: Metaphors We Live By. University of Chicago Press, Chicago (2003) https://doi.org/10.7208/chicago/9780226470993.001.0001
3. Chakrabarty, T. et al.: Figurative language in recognizing textual entailment. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. pp. 3354-3361. Association for Computational Linguistics (2021) https://doi.org/10.18653/v1/2021.findings-acl.297;
  Sengupta, M., Alshomary, M., Wachsmuth, H.: Back to the roots: Predicting the source domain of metaphors using contrastive learning. In: Proceedings of the 3rd Workshop on Figurative Language Processing (FLP). pp. 137-142. Association for Computational Linguistics (2022) https://doi.org/10.18653/v1/2022.flp-1.19
4. Shutova, E., Teufel, S., Korhonen, A.: Statistical metaphor processing. Computational Linguistics 39(2), 301-353 (2013) https://doi.org/10.1162/COLI_a_00124
5. Stowe, K., Beck, N., Gurevych, I.: Exploring metaphoric paraphrase generation. In: Proceedings of the 25th Conference on Computational Natural Language Learning. pp. 323-336. Association for Computational Linguistics (2021) https://doi.org/10.18653/v1/2021.conll-1.26
6. Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics. pp. 86-90 (1998) https://doi.org/10.3115/980451.980860
7. Sengupta, M. et al.: Modeling highlighting of metaphors in multitask contrastive learning paradigms. In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2023. pp. 4636-4659. Association for Computational Linguistics, Singapore (2023) https://doi.org/10.18653/v1/2023.findings-emnlp.308
8. Chakrabarty, T., et al.: FLUTE: Figurative Language Understanding and Textual Entailment. In: Proceedings of EMNLP 2022. (2022) https://doi.org/10.18653/v1/2022.emnlp-main.481
9. Leong, C.W., Beigman Klebanov, B., Shutova, E.: A report on the 2018 VUA metaphor detection shared task. In: Proceedings of the Workshop on Figurative Language Processing. pp. 56-66. Association for Computational Linguistics, New Orleans, Louisiana (2018) https://doi.org/10.18653/v1/W18-0907;
  Tong, X., Shutova, E., Lewis, M.: Recent advances in neural metaphor processing: A linguistic, cognitive and social perspective. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 4673-4686 (2021) https://doi.org/10.18653/v1/2021.naacl-main.372;
10. Dmitrijev, A.V., Krupnova, E.S., Protopopova, A.A.: Metaphors and analogies in the context of large language models. In: International Conference on Professional Culture of the Specialist of the Future. pp. 326-341. Springer (2024) https://doi.org/10.1007/978-3-031-76797-5_26
11. Skrynnikova, I.V.: Interpreting metaphorical language: A challenge to artificial intelligence. Bulletin of Volgograd State University. Series 2: Linguistics 23(5), 99-107 (2024) https://doi.org/10.15688/jvolsu2.2024.5.8
12. Comşa, I., Eisenschlos, J., Narayanan, S.: MiQA: A benchmark for inference on metaphorical questions. In: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). pp. 373-381. Association for Computational Linguistics (2022)
  https://doi.org/10.18653/v1/2022.aacl-short.46
13. Fumagalli, F. et al.: Unifying feature-based explanations with functional ANOVA and cooperative game theory. In: 28th International Conference on Artificial Intelligence and Statistics (AISTATS) (2025)
Literaturverzeichnis
1. Kövecses, Z.: Metaphor, language, and culture. DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada 26, 739-757 (2010). https://doi.org/10.1590/S0102-44502010000300017
2. Gábor, T.: Metaphors in Everyday English. College of Nyiregyhaza (2014). https://doi.org/10.31219/osf.io/zbq8v
3. Lakoff, G., Johnson, M.: Metaphors We Live By. University of Chicago Press, Chicago (2003). https://doi.org/10.7208/chicago/9780226470993.001.0001
4. Chakrabarty, T., Ghosh, D., Poliak, A., Muresan, S.: Figurative language in recognizing textual entailment. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. pp. 3354-3361. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.findings-acl.297
5. Sengupta, M., Alshomary, M., Wachsmuth, H.: Back to the roots: Predicting the source domain of metaphors using contrastive learning. In: Proceedings of the 3rd Workshop on Figurative Language Processing (FLP). pp. 137-142. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.flp-1.19
6. Shutova, E., Teufel, S., Korhonen, A.: Statistical metaphor processing. Computational Linguistics 39(2), 301-353 (2013). https://doi.org/10.1162/COLI_a_00124
7. Stowe, K., Beck, N., Gurevych, I.: Exploring metaphoric paraphrase generation. In: Proceedings of the 25th Conference on Computational Natural Language Learning. pp. 323-336. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.conll-1.26
8. Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics. pp. 86-90 (1998). https://doi.org/10.3115/980451.980860
9. Sengupta, M., Alshomary, M., Scharlau, I., Wachsmuth, H.: Modeling highlighting of metaphors in multitask contrastive learning paradigms. In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2023. pp. 4636-4659. Association for Computational Linguistics, Singapore (2023). https://doi.org/10.18653/v1/2023.findings-emnlp.308
10. Leong, C.W., Beigman Klebanov, B., Shutova, E.: A report on the 2018 VUA metaphor detection shared task. In: Proceedings of the Workshop on Figurative Language Processing. pp. 56-66. Association for Computational Linguistics, New Orleans, Louisiana (2018). https://doi.org/10.18653/v1/W18-0907
11. Chakrabarty, T., et al.: FLUTE: Figurative Language Understanding and Textual Entailment. In: Proceedings of EMNLP 2022. (2022). https://doi.org/10.18653/v1/2022.emnlp-main.481
12. Tong, X., Shutova, E., Lewis, M.: Recent advances in neural metaphor processing: A linguistic, cognitive and social perspective. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 4673-4686 (2021). https://doi.org/10.18653/v1/2021.naacl-main.372
13. Dmitrijev, A.V., Krupnova, E.S., Protopopova, A.A.: Metaphors and analogies in the context of large language models. In: International Conference on Professional Culture of the Specialist of the Future. pp. 326-341. Springer (2024). https://doi.org/10.1007/978-3-031-76797-5_26
14. Gallipoli, G., Cagliero, L.: It is not a piece of cake for GPT: Explaining textual entailment recognition in the presence of figurative language. In: Proceedings of the 31st International Conference on Computational Linguistics. pp. 9656-9674 (2025).
15. Skrynnikova, I.V.: Interpreting metaphorical language: A challenge to artificial intelligence. Bulletin of Volgograd State University. Series 2: Linguistics 23(5), 99-107 (2024). https://doi.org/10.15688/jvolsu2.2024.5.8
16. Comşa, I., Eisenschlos, J., Narayanan, S.: MiQA: A benchmark for inference on metaphorical questions. In: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). pp. 373-381. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.aacl-short.46
17. Fumagalli, F., Muschalik, M., Hüllermeier, E., Hammer, B., Herbinger, J.: Unifying feature-based explanations with functional ANOVA and cooperative game theory. In: 28th International Conference on Artificial Intelligence and Statistics (AISTATS) (2025)
18. Muschalik, M., Baniecki, H., Fumagalli, F., Kolpaczki, P., Hammer, B., Hüllermeier, E.: shapiq: Shapley interactions for machine learning. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS) (2024)
19. Dubey, A., et al.: The Llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)
20. Jiang, A.Q., et al.: Mistral 7B. arXiv preprint arXiv:2310.06825 (2023)

Bibliografische Daten

Erscheinungsdatum	5. März 2026
DOI	10.64136/winq8258
Creative Commons Lizenz

Investigating the Impact of Conceptual Metaphors on LLM-based NLI through Shapley Interactions

Introduciton

Takeaway

Nächstes Kapitel

15 Towards Symbolic XAI – Explanation Through Human Understandable Logical Relationships Between Features