The Principal’s Principles: Actionable (Personalized) AI Alignment as Underexplored XAI Application Context

Kevin Baum, Richard Uth, Holger Hermanns, Sophie Kerstan, Markus Langer, Anne Lauber-Rönsberg, Philip Meinel, Laura Stenzel, Sarah Sterz und Hanwei Zhang

Explainable Artificial Intelligence (XAI) has been proposed as a key element—or even a prerequisite—for addressing various challenges and fulfilling numerous societal desiderata. Yet, there is one topic that is frequently debated but rarely recognized as a relevant application context for XAI methods: the alignment of artificial intelligence agents (AIAs) (with a few exceptions).

Background and Motivation

In the foreseeable future, AIAs—ranging from software agents (such as OpenAI’s Operator or Google’s Project Mariner ) to cyber-physical systems (like Tesla’s Optimus or 1X’s Neo)—will co-inhabit both our digital and physical environments. These agents will execute tasks delegated to them by humans (human principals) either directly or indirectly, often involving considerable technical autonomy. This scenario immediately raises the critical challenge of ensuring these agents act as they ought to, i.e., constrained by human intents and preferences or guided by norms from diverse domains—a bundle of challenges commonly known as the AI alignment problem.

While the exact formulation of the AI alignment challenge and the criteria for solving it remain debated, we argue that methods from XAI should—and inevitably will—play a central role. Concretely, understanding task delegation to AIAs as intent-driven interaction that establishes extended human agency raises questions closely linked to indirect human oversight and responsibility gaps — questions that are inherently associated with XAI research. In the following, we briefly outline three key aspects supporting our argument:

Personal Normative Alignment and Delegation as Extension of Agency

We propose understanding the delegation of tasks to artificial agents (AIAs) by human principals as a form of extension of agency via personal normative alignment and focus on the three factors of warranted trust, appropriate responsibility, and anticipatory control.

To this end, we propose breaking down solving personal normative alignment into a series of sub-tasks. Rather than embedding general normative principles or values directly into AIAs, the focus should be on enabling human principals to:

co-create the formulation and explication of their normative expectations relative to foreseeable contexts, including comprehending the implications of their judgements;
communicate these expectations to AIAs in an unambiguous, interactive manner;
verify that the AIAs have correctly ‘understood’ these normative expectations and that these AIAs act reliably and robustly in accordance with these expectations.

In combination, the fulfilment of these requirements allows to establish a conceptual link between what the AIAs do and the moral responsibility of the human principals for the AIAs’ behaviours in form of indirect control and responsibility for a wide range of cases by fulfilling the traditional conditions of control and epistemic access. For this, however, we suggest that it needs justifiability as a special kind of explainability.

The Role of XAI and Justifiability

We argue that XAI technologies are a necessary foundation for meeting two of the three requirements outlined above. In particular, iterative XAI processes—likely processes based on contrastive and counterfactual methods—are crucial for the co-creation of a human principal’s normative expectations, especially in light of the potential consequences such expectations may entail once articulated.

Moreover, we believe that justifications as a currently underexplored class of explanation techniques will be central to the verification of whether an AI system has correctly grasped the intent behind those expectations.

While explanations, broadly speaking, provide answers to why-questions, justifications explain (typically in terms of reasons) why something is right, appropriate, or acceptable according to a given normative standard. Justifications (or, at least, explanations from which the human principal can reasonably infer such justifications) are essential for enabling a human principal to assess whether an AIA has correctly ‘understood’ the principals’ normative expectations. Importantly, such justifications are critical for assessing the agent’s trustworthiness and, thus, also for fostering appropriately calibrated, justified, and potentially even warranted trust.

Indirect Responsibility and Forward-Looking Human Control and Oversight

We claim that if the above conditions are met and all relevant application contexts have been taken into account, the result is a successfully personally normatively aligned AIA. We claim further that, as a result, at least some traditional conditions for moral responsibility are met: The epistemic condition is satisfied once the human principal has sufficient anticipatory understanding (through explanations and/or justifications) about how and why the AIA will or would act in specific contexts. The control condition may often be indirectly met through clearly communicated normative expectations, their assessment, and contextually sensitive anticipatory authorization. Therefore, personal normative alignment allows appropriate responsibility attributions and offers a plausible account of indirect, anticipatory human control and oversight.

In other words: Given the conditions above, all of an AIA A’s actions that take place in forseen application contexts will ceteris paribus (especially in absence of malfunction) be permissible according to all of the normative expectations of a human principal H that have been correctly ‘transferred’ to A. Thus, the AIA’s actions are (more or less) explicitly yet anticipatorily authorized and sanctified by H in these foreseen contexts. Insofar, H bears (at least some kind of indirect) backward-looking responsibility for A’s actions, while bearing forward-looking responsibility via the direct responsibility to give the AIA normative guardrails via personal normative alignment. In this respect, the human principal becomes the locus of responsibility, the appropriate object of blame (and, of course, praise), because AIAs may be seen as their (metaphorical or actual) extension of action and, thus, this allows for an indirect version of meaningful/effective forward-looking human control. (In sufficiently rich application domains, however, it cannot be hoped that all relevant contexts can be considered in advance; in this respect, approximate measures of personal normative alignment and safe exploration of new contexts are another key issue of the overall approach.)

Presentation The Principal’s Principles: Actionable (Personalized) AI Alignment as Underexplored XAI held at the 3rd TRR 318 Conference: Contextualizing Explanations on 17th of June 2025 in Bielefeld, Germany

Nächstes Kapitel

3 Development of a Human Knowledge Integrated Workflow for Context-aware Machine Learning

Introduction Visual analytics (VA) has the potential to aid in the acquisition of expert knowledge; both existing domain knowledge and new insights. However, there is a notable lack of VA systems specifically designed to externalize expert knowledge for machine learning (ML) integration. Prior research has explored the role of VA in…

Schriftgröße

Klein

Mittel

Groß

Hintergrund

% Lesefortschritt

Inhaltsverzeichnis
Contextualizing Explanations
Fußnoten
1. Langer, M., Oster, D., Speith, T., Hermanns, H., Kästner, L., Schmidt, E., Sesing, A., Baum, K.: What do we want from Explainable Artificial Intelligence (XAI)?–A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artificial intelligence 296, 103473 (2021). https://doi.org/10.1016/j.artint.2021.103473;
  Köhl, M.A., Baum, K., Langer, M., Oster, D., Speith, T., Bohlender, D.: Explainability as a non-functional requirement. In: Proc. of the 2019 IEEE 27th Int. Conf. on Requirements Engineering (RE). pp. 363–368. IEEE (2019). https://doi.org/10.1109/RE.2019.00046;
  Sterz, S., Baum, K., Lauber-Rönsberg, A., Hermanns, H.: Towards perspicuity requirements. In: Proc. of the 2021 IEEE 29th Int. Conf. on Requirements Engineering - Workshops (REW). pp. 159–163. IEEE (2021). https://doi.org/10.1109/REW53955.2021.00029
2. Gabriel, I.: Artificial Intelligence, Values, and Alignment. Minds and Machines 30(3), 411–437 (2020). https://doi.org/10.1007/s11023-020-09539-2;
  Ji, J., Qiu, T., Chen, B., Zhang, B., Lou, H., Wang, K., Duan, Y., He, Z., Zhou, J., Zhang, Z., Zeng, F., Ng, K.Y., Dai, J., Pan, X., O’Gara, A., Lei, Y., Xu, H., Tse, B., Fu, J., McAleer, S., Yang, Y., Wang, Y., Zhu, S.C., Guo, Y., Gao, W.: AI Alignment: A Comprehensive Survey (2024). https://doi.org/10.48550/arXiv.2310.19852;
  Baum, K.: Disentangling AI Alignment: A Structured Taxonomy Beyond Safety and Ethics (2025), https://arxiv.org/abs/2506.06286. https://doi.org/10.1007/978-3-032-01377-4_8
3. Sanneman, L., Shah, J.: Transparent Value Alignment. In: Proc. of the 2023 ACM/IEEE Int. Conf. on Human-Robot Interaction. pp. 557–560 (2023). https://doi.org/10.1145/3568294.3580147
4. Gabriel, I.: Artificial Intelligence, Values, and Alignment. Minds and Machines 30(3), 411–437 (2020). https://doi.org/10.1007/s11023-020-09539-2
5. cf.
  Baum, K., Mantel, S., Speith, T., Schmidt, E.: From Responsibility to Reason-Giving Explainable Artificial Intelligence. Philosophy and Technology 35(1), 1–30 (2022). https://doi.org/10.1007/s13347-022-00510-w;
  Matthias, A.: The Responsibility Gap: Ascribing Responsibility for the Actions of Learning Automata. Ethics and information technology 6, 175–183 (2004). https://doi.org/10.1007/s10676-004-3422-1
6. Rohlfing, K.J., Cimiano, P., Scharlau, I., Matzner, T., Buhl, H.M., Buschmeier, H., Esposito, E., Grimminger, A., Hammer, B., Häb-Umbach, R., et al.: Explanation as a Social Practice: Toward a Conceptual Framework for the Social Design of AI Systems. IEEE Transactions on Cognitive and Developmental Systems 13(3), 717–728 (2020). https://doi.org/10.1109/TCDS.2020.3044366
7. Noorman, M.: Computing and Moral Responsibility. In: Zalta, E.N., Nodelman, U. (eds.) The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Spring 2023 edn. (2023)
8. cf. Schlicker, N., Baum, K., Uhde, A., Sterz, S., Hirsch, M.C., Langer, M.: How do we assess the trustworthiness of ai? introducing the trustworthiness assessment model (tram). Computers in Human Behavior 170, 108671 (2025). https://doi.org/10.1016/j.chb.2025.108671
9. Baum, K., Mantel, S., Speith, T., Schmidt, E.: From Responsibility to Reason-Giving Explainable Artificial Intelligence. Philosophy and Technology 35(1), 1–30 (2022). https://doi.org/10.1007/s13347-022-00510-w;
  Fischer, J.M., Ravizza, M.: Responsibility and Control: A Theory of Moral Responsibility. Cambridge university press (1998). https://doi.org/10.1017/CBO9780511814594;
  Mele, A.: Moral Responsibility for Actions: Epistemic and Freedom Conditions. Philosophical Explorations 13(2), 101–111 (2010). https://doi.org/10.1080/13869790903494556;
  Noorman, M.: Computing and Moral Responsibility. In: Zalta, E.N., Nodelman, U. (eds.) The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Spring 2023 edn. (2023);
  Sterz, S., Baum, K., Biewer, S., Hermanns, H., Lauber-Rönsberg, A., Meinel, P., Langer, M.: On the Quest for Effectiveness in Human Oversight: Interdisciplinary Perspectives. In: Proc. of the 2024 ACM Conf. on Fairness, Accountability, and Transparency. pp. 2495–2507 (2024). https://doi.org/10.1145/3630106.3659051
10. see also van de Poel, I.: The Relation Between Forward-Looking and Backward-Looking Responsibility, pp. 37–52. Springer Netherlands, Dordrecht (2011). https://doi.org/10.1007/978-94-007-1878-4_3
11. Malle, B.F., Guglielmo, S., Monroe, A.E.: A Theory of Blame. Psychological Inquiry 25(2), 147–186 (2014). https://doi.org/10.1080/1047840X.2014.877340;
  Tognazzini, N., Coates, D.J.: Blame. In: Zalta, E.N., Nodelman, U. (eds.) The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Fall 2024 edn. (2024)
12. Sterz, S., Baum, K., Biewer, S., Hermanns, H., Lauber-Rönsberg, A., Meinel, P., Langer, M.: On the Quest for Effectiveness in Human Oversight: Interdisciplinary Perspectives. In: Proc. of the 2024 ACM Conf. on Fairness, Accountability, and Transparency. pp. 2495–2507 (2024). https://doi.org/10.1145/3630106.3659051
Literaturverzeichnis
1. Baum, K.: Disentangling AI Alignment: A Structured Taxonomy Beyond Safety and Ethics (2025), https://doi.org/10.1007/978-3-032-01377-4_8">https://arxiv.org/abs/2506.06286. https://doi.org/10.1007/978-3-032-01377-4_8.
2. Baum, K., Mantel, S., Speith, T., Schmidt, E.: From Responsibility to Reason-Giving Explainable Artificial Intelligence. Philosophy and Technology 35(1), 1-30 (2022). https://doi.org/10.1007/s13347-022-00510-w.
3. Fischer, J.M., Ravizza, M.: Responsibility and Control: A Theory of Moral Responsibility. Cambridge University Press (1998). https://doi.org/10.1017/CBO9780511814594.
4. Gabriel, I.: Artificial Intelligence, Values, and Alignment. Minds and Machines 30(3), 411-437 (2020) https://doi.org/10.1007/s11023-020-09539-2.
5. Ji, J., Qiu, T., Chen, B., Zhang, B., Lou, H., Wang, K., Duan, Y., He, Z., Zhou, J., Zhang, Z., Zeng, F., Ng, K.Y., Dai, J., Pan, X., O'Gara, A., Lei, Y., Xu, H., Tse, B., Fu, J., McAleer, S., Yang, Y., Wang, Y., Zhu, S.C., Guo, Y., Gao, W.: AI Alignment: A Comprehensive Survey (2024). https://doi.org/10.48550/arXiv.2310.19852.
6. Köhl, M.A., Baum, K., Langer, M., Oster, D., Speith, T., Bohlender, D.: Explainability as a non-functional requirement. In: Proc. of the 2019 IEEE 27th Int. Conf. on Requirements Engineering (RE). pp. 363-368. IEEE (2019). https://doi.org/10.1109/RE.2019.00046.
7. Langer, M., Oster, D., Speith, T., Hermanns, H., Kästner, L., Schmidt, E., Sesing, A., Baum, K.: What do we want from Explainable Artificial Intelligence (XAI)?-A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artificial intelligence 296, 103473 (2021). https://doi.org/10.1016/j.artint.2021.103473.
8. Malle, B.F., Guglielmo, S., Monroe, A.E.: A Theory of Blame. Psychological Inquiry 25(2), 147-186 (2014).
  https://doi.org/10.1080/1047840X.2014.877340.
9. Matthias, A.: The Responsibility Gap: Ascribing Responsibility for the Actions of Learning Automata. Ethics and information technology 6, 175-183 (2004). https://doi.org/10.1007/s10676-004-3422-1.
10. Mele, A.: Moral Responsibility for Actions: Epistemic and Freedom Conditions. Philosophical Explorations 13(2), 101-111 (2010). https://doi.org/10.1080/13869790903494556.
11. Noorman, M.: Computing and Moral Responsibility. In: Zalta, E.N., Nodelman, U. (eds.) The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Spring 2023 edn. (2023)
12. van de Poel, I.: The Relation Between Forward-Looking and Backward-Looking Responsibility, pp. 37-52. Springer Netherlands, Dordrecht (2011). https://doi.org/10.1007/978-94-007-1878-4_3.
13. Rohlfing, K.J., Cimiano, P., Scharlau, I., Matzner, T., Buhl, H.M., Buschmeier, H., Esposito, E., Grimminger, A., Hammer, B., Häb-Umbach, R., et al.: Explanation as a Social Practice: Toward a Conceptual Framework for the Social Design of AI Systems. IEEE Transactions on Cognitive and Developmental Systems 13(3), 717-728 (2020). https://doi.org/10.1109/TCDS.2020.3044366.
14. Sanneman, L., Shah, J.: Transparent Value Alignment. In: Proc. of the 2023 ACM/IEEE Int. Conf. on Human-Robot Interaction. pp. 557-560 (2023). https://doi.org/10.1145/3568294.3580147.
15. Schlicker, N., Baum, K., Uhde, A., Sterz, S., Hirsch, M.C., Langer, M.: How do we assess the trustworthiness of ai? introducing the trustworthiness assessment model (tram). Computers in Human Behavior 170, 108671 (2025). https://doi.org/10.1016/j.chb.2025.108671.
16. Sterz, S., Baum, K., Biewer, S., Hermanns, H., Lauber-Rönsberg, A., Meinel, P., Langer, M.: On the Quest for Effectiveness in Human Oversight: Interdisciplinary Perspectives. In: Proc. of the 2024 ACM Conf. on Fairness, Accountability, and Transparency. pp. 2495-2507 (2024). https://doi.org/10.1145/3630106.3659051
17. Sterz, S., Baum, K., Lauber-Rönsberg, A., Hermanns, H.: Towards perspicuity requirements. In: Proc. of the 2021 IEEE 29th Int. Conf. on Requirements Engineering - Workshops (REW). pp. 159-163. IEEE (2021). https://doi.org/10.1109/REW53955.2021.00029.
18. Tognazzini, N., Coates, D.J.: Blame. In: Zalta, E.N., Nodelman, U. (eds.) The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Fall 2024 edn. (2024).

Bibliografische Daten

Erscheinungsdatum	5. März 2026
DOI	10.64136/jfoo1236
Creative Commons Lizenz