Researchers from the INF project have developed a model that aims to predict the optimal prompts. This should improve the generated output of language models. Prompts are crucial in the application of language models as they influence the direction and focus of the generated content, for example, when creating explanations. Maximilian Spliethöver presented the TRR 318 study at the NAACL 2025 conference in New Mexico, USA.
Prompting is currently being tested by many people with ChatGPT: Which text input leads to the desired result? There are several techniques to do this. For example, the language model behind ChatGPT can be prompted to take a specific role, or the task can be described in detail, including definitions.
The computer scientists Professor Henning Wachsmuth and Maximilian Spliethöver from Leibniz University Hannover aimed to train a model that automatically suggests the best composition of prompt techniques, such as definitions or specific roles. The task for the language models was to output the stereotypes from three different text collections, a highly context-dependent task that requires semantic understanding. “To do this, we could give the language model an example of a stereotype or explain exactly what we mean by stereotypes,” explains Spliethöver. “In our model, the different techniques were automatically combined with each other.”
The researchers applied their model to predict the optimal prompt for three large language models (LLMs) of different sizes. “We found that our approach works differently well for the different LLMs and datasets, but in many cases, we were able to see an improvement over other methods for all three,” says Spliethöver, summarizing the results.
In addition, the team from project C03 tested an alternative prediction method using so called Shapley values. Here, the different compositions of techniques were considered as features in order to calculate how, for example, the addition of a technique affected the result. “There is no need to train a model for this,” explains Spliethöver. However, both methods are dependent on the dataset and the LLMs. They are also very computation-intensive overall, which makes them rather unrealistic for users.
“The aim should rather be to develop a language model that can also deal with suboptimal prompts,” says Spliethöver. “Until then, however, we need automatic systems to improve prompts. This is because a prompting strategy does not currently work equally well for all situations, and better prompts also lead to better explanations, as we are researching in TRR 318.”
To the publication: Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection
To the conference: Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics