Project C01: Explanations for healthy distrust in large language models

Machine learning models have limitations. That is why it is important to critically question the results of applications based on large language models (LLMs) such as ChatGPT instead of trusting them blindly. In the first funding phase of the project, a common language for trust and distrust was developed. A study also showed that simple warnings (disclaimers) are not sufficient to promote a healthy level of skepticism. Therefore, the researchers from psychology and computer science developed new approaches to explain model uncertainties in an understandable way. Building on this, the project team is now developing targeted measures to promote healthy distrust, especially when LLMs are used as support in scientific writing. New forms of explanations, dubbed “perplexing explanations,” play a central role in this. These explanations are intended to show the unreliability of LLMs to students and will be used as an automatic method in the TRR 318 to strengthen human autonomy in dealing with AI systems.

 

Research areas: Computer science, Psychology

Project Leaders

Prof Barbara Hammer

More about the person

Prof. Benjamin Paaßen

More about the person

Prof. Dr. Ingrid Scharlau

More about the person

Staff

Support Staff

Valeska Behr, Paderborn University

Oliver Debernitz, Paderborn University

Former Members

Roelof Visser, Research associate

Pub­lic­a­tions

Assessing healthy distrust in human-AI interaction: interpreting changes in visual attention

T.M. Peters, K. Biermeier, I. Scharlau, Frontiers in Psychology 16 (2026).




Explaining Outliers using Isolation Forest and Shapley Interactions

R. Visser, F. Fumagalli, E. Hüllermeier, B. Hammer, in: Proceedings of the European Symposium on Artificial Neural Networks (ESANN), 2025.


Healthy Distrust in AI systems

B. Paaßen, S. Alpsancar, T. Matzner, I. Scharlau, ArXiv (2025).


The Importance of Distrust in AI

T.M. Peters, R.W. Visser, in: Communications in Computer and Information Science, Springer Nature Switzerland, Cham, 2023.



Explaining Reject Options of Learning Vector Quantization Classifiers

A. Artelt, J. Brinkrolf, R. Visser, B. Hammer, in: Proceedings of the 14th International Joint Conference on Computational Intelligence, SCITEPRESS - Science and Technology Publications, 2022.


Model Agnostic Local Explanations of Reject

A. Artelt, R. Visser, B. Hammer, in: ESANN 2022 Proceedings, Ciaco - i6doc.com, 2022.


Show all publications