The instrumental role of explanations for trustworthy AI
Do we need explanations of AI outputs in order to obtain trustworthy AI? In recent years there has been an active philosophical debate on this question, with a range of authors arguing that in fact explanations are not needed for justification in believing AI outputs (Dúran & Jongsma, 2021; London, 2019) or even for more broadly ethical use of AI (Kawamleh, 2022; Krishnan, 2020). In most cases, the underlying reasoning is that justification can come from the AI system being sufficiently accurate, and that other goals we might have (such as bias detection) can be achieved through means other than explanations.
I argue that this misses important aspects of how AI is embedded in socio-technical systems. Empirical research shows that decision-makers fail to calibrate their reliance on AI systems without explanations (Schoeffer et al, 2023). Furthermore, feedback loops to improve overall decision-making in the form of accountability mechanisms and procedures to establish legitimacy are far more complicated to set up in the absence of explanations.
The role of explanations should therefore not be seen in isolation from the broader decision-making procedures of which an AI system is a part. As such, there are outcome-based arguments for why explanations are an important part of trustworthy AI: good explanations are needed to improve these systems in the longer term and thus to ensure just outcomes. There are also more procedural arguments for good explanations, as just procedures are often thought to require showing that decisions are made for the right reasons. Furthermore, to maintain responsibility over actions one may think that it’s necessary to recognize and respond to appropriate reasons for actions, and here too explainability is an important part of ensuring that we can keep doing so despite the introduction of AI systems. I thus argue for an instrumental, but nonetheless important, role for explanations to ensure trustworthy AI.
References
Durán, J. M., & Jongsma, K. R. (2021). Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical AI. Journal of Medical Ethics, 47(5), 329-335.
London, A. J. (2019). Artificial intelligence and black‐box medical decisions: accuracy versus explainability. Hastings Center Report, 49(1), 15-21.
Kawamleh, S. (2023). Against explainability requirements for ethical artificial intelligence in health care. AI and Ethics, 3(3), 901-916.
Krishnan, M. (2020). Against interpretability: a critical examination of the interpretability problem in machine learning. Philosophy & Technology, 33(3), 487-502.
Schoeffer, J., Jakubik, J., Voessing, M., Kuehl, N., & Satzger, G. (2023). On the interdependence of reliance behavior and accuracy in AI-assisted decision-making. arXiv preprint arXiv:2304.08804.