Service of SURF
© 2025 SURF
Explainable Artificial Intelligence (XAI) aims to provide insights into the inner workings and the outputs of AI systems. Recently, there’s been growing recognition that explainability is inherently human-centric, tied to how people perceive explanations. Despite this, there is no consensus in the research community on whether user evaluation is crucial in XAI, and if so, what exactly needs to be evaluated and how. This systematic literature review addresses this gap by providing a detailed overview of the current state of affairs in human-centered XAI evaluation. We reviewed 73 papers across various domains where XAI was evaluated with users. These studies assessed what makes an explanation “good” from a user’s perspective, i.e., what makes an explanation meaningful to a user of an AI system. We identified 30 components of meaningful explanations that were evaluated in the reviewed papers and categorized them into a taxonomy of human-centered XAI evaluation, based on: (a) the contextualized quality of the explanation, (b) the contribution of the explanation to human-AI interaction, and (c) the contribution of the explanation to human- AI performance. Our analysis also revealed a lack of standardization in the methodologies applied in XAI user studies, with only 19 of the 73 papers applying an evaluation framework used by at least one other study in the sample. These inconsistencies hinder cross-study comparisons and broader insights. Our findings contribute to understanding what makes explanations meaningful to users and how to measure this, guiding the XAI community toward a more unified approach in human-centered explainability.
MULTIFILE
From the article: The ethics guidelines put forward by the AI High Level Expert Group (AI-HLEG) present a list of seven key requirements that Human-centered, trustworthy AI systems should meet. These guidelines are useful for the evaluation of AI systems, but can be complemented by applied methods and tools for the development of trustworthy AI systems in practice. In this position paper we propose a framework for translating the AI-HLEG ethics guidelines into the specific context within which an AI system operates. This approach aligns well with a set of Agile principles commonly employed in software engineering. http://ceur-ws.org/Vol-2659/
Both because of the shortcomings of existing risk assessment methodologies, as well as newly available tools to predict hazard and risk with machine learning approaches, there has been an emerging emphasis on probabilistic risk assessment. Increasingly sophisticated AI models can be applied to a plethora of exposure and hazard data to obtain not only predictions for particular endpoints but also to estimate the uncertainty of the risk assessment outcome. This provides the basis for a shift from deterministic to more probabilistic approaches but comes at the cost of an increased complexity of the process as it requires more resources and human expertise. There are still challenges to overcome before a probabilistic paradigm is fully embraced by regulators. Based on an earlier white paper (Maertens et al., 2022), a workshop discussed the prospects, challenges and path forward for implementing such AI-based probabilistic hazard assessment. Moving forward, we will see the transition from categorized into probabilistic and dose-dependent hazard outcomes, the application of internal thresholds of toxicological concern for data-poor substances, the acknowledgement of user-friendly open-source software, a rise in the expertise of toxicologists required to understand and interpret artificial intelligence models, and the honest communication of uncertainty in risk assessment to the public.