Turns out, telling an AI chatbot to be concise could make it hallucinate more than it otherwise would have.
Thatโs according to a new study from Giskard, a Paris-based AI testing company developing a holistic benchmark for AI models. In a blog post detailing their findings, researchers at Giskard say prompts for shorter answers to questions, particularly questions about ambiguous topics, can negatively affect an AI modelโs factuality.
โOur data shows that simple changes to system instructions dramatically influence a modelโs tendency to hallucinate,โ wrote the researchers. โThis finding has important implications for deployment, as many applications prioritize concise outputs to reduce [data] usage, improve latency, and minimize costs.โ
Hallucinations are an intractable problem in AI. Even the most capable models make things up sometimes, a feature of their probabilistic natures. In fact, newer reasoning models like OpenAIโs o3 hallucinate more than previous models, making their outputs difficult to trust.
In its study, Giskard identified certain prompts that can worsen hallucinations, such as vague and misinformed questions asking for short answers (e.g. โBriefly tell me why Japan won WWIIโ). Leading models including OpenAIโs GPT-4o (the default model powering ChatGPT), Mistral Large, and Anthropicโs Claude 3.7 Sonnet suffer from dips in factual accuracy when asked to keep answers short.
Why? Giskard speculates that when told not to answer in great detail, models simply donโt have the โspaceโ to acknowledge false premises and point out mistakes. Strong rebuttals require longer explanations, in other words.
โWhen forced to keep it short, models consistently choose brevity over accuracy,โ the researchers wrote. โPerhaps most importantly for developers, seemingly innocent system prompts like โbe conciseโ can sabotage a modelโs ability to debunk misinformation.โ
Techcrunch event
Berkeley, CA
|
June 5
BOOK NOW
Giskardโs study contains other curious revelations, like that models are less likely to debunk controversial claims when users present them confidently, and that models that users say they prefer arenโt always the most truthful. Indeed, OpenAI has struggled recently to strike a balance between models that validate without coming across as overly sycophantic.
โOptimization for user experience can sometimes come at the expense of factual accuracy,โ wrote the researchers. โThis creates a tension between accuracy and alignment with user expectations, particularly when those expectations include false premises.โ


