Researchers have found evidence that artificial intelligence models lie rather than admit the shame of not knowing something. This behavior becomes more evident as size and complexity increase.
A new study published in Nature shows that as LLMs get larger, they become less reliable for certain tasks. Although we don’t lie in exactly the same way that we recognize words, we do tend to give confident answers even if the answers aren’t true. Because they have been trained to believe that it is true.
This phenomenon, which researchers have dubbed “ultra-crepidarian” (a 19th-century word that basically means expressing an opinion about something one knows nothing about), sees LLMs venturing far beyond their knowledge base to provide answers. Explain. “(LLMs) fail proportionally more when they answer without knowing,” the study noted. That is, the model is not aware of its own ignorance.
This study, which examined the performance of several LLM families, including OpenAI’s GPT series, Meta’s LLaMA model, and BigScience’s BLOOM suite, highlights the disconnect between increasing model capabilities and reliable real-world performance.
Larger LLMs generally show improved performance on complex tasks, but these improvements do not necessarily translate into consistent accuracy, especially on simple tasks. This “difficulty mismatch” (the failure of LLMs at tasks easily perceived by humans) undermines the idea of a stable operating domain for these models. Despite increasingly sophisticated training methods, including scaling model sizes and data volumes and shaping models based on human feedback, researchers have yet to find a reliable way to eliminate these inconsistencies.
The results of this study directly contradict existing beliefs about AI development. Traditionally, it was thought that increasing the size, amount of data, and computational power of the model would yield more accurate and reliable results. However, research shows that scaling can actually worsen reliability problems.
Larger models were shown to significantly reduce task avoidance. This means you’ll be less inclined to avoid difficult questions. While this may seem like a positive development at first glance, it also has important downsides. These models are also more likely to give incorrect answers. In the graph below, you can easily see how the model avoids the task (light blue) and instead throws incorrect results (red). Correct answers are displayed in dark blue.
“Current scaling and shaping trade evasion for more inaccuracy,” the researchers noted, but solving this problem isn’t as easy as training the model more carefully. “For the formed model, the evasion rate is much lower, but the inaccuracy is much higher,” the researchers said. However, models trained to prevent task execution can become lazy or nerfed, as users have mentioned in other top-rated LLMs such as ChatGPT or Claude.
The researchers found that this phenomenon was not because larger LLMs were unable to excel at simple tasks, but instead were trained to be better at complex tasks. It’s like someone who only ate delicious food suddenly had difficulty making home barbecue or traditional cakes. AI models trained on massive and complex datasets are prone to missing basic skills.
The problem is further complicated by the model’s apparent confidence. It is often difficult for users to distinguish between when an AI is providing accurate information and when it is confidently spewing out incorrect information. This overconfidence can lead to dangerous over-reliance on AI results, especially in critical fields such as healthcare or legal advice.
The researchers also noted that the reliability of the extended model fluctuated across different domains. Performance may increase in one area while performance may decrease in another, creating a whack-a-mole effect that makes it difficult to establish a “safe” workspace. “The proportion of evasive answers rarely increases faster than the proportion of incorrect answers. The readouts are clear: errors still occur more frequently. This represents a breakthrough in reliability,” the researchers wrote.
This study highlights the limitations of current AI training methods. Techniques such as reinforcement learning with human feedback (RLHF) to shape AI behavior may actually make the problem worse. This approach appears to reduce the model’s tendency to avoid tasks it cannot handle. Remember the infamous “Can’t AI language models do that?” This inadvertently encourages more frequent errors.
“As an AI language model, I…
I hope that the LLM will allow you to get down to the nitty-gritty and explore your innermost thoughts.
I want to see both the beautiful world and the ugly world contained within its billions of weight. A world that reflects ourselves.
— Hardmaru (@hardmaru) May 9, 2023
Rapid engineering, the art of writing effective queries against AI systems, appears to be a key skill in responding to these challenges. Even highly advanced models like GPT-4 are sensitive to how the question is phrased, and slight differences can lead to drastically different results.
This can be seen more easily when comparing different LLM series. For example, Claude 3.5 Sonnet requires a completely different prompt style than OpenAI o1 for best results. Inappropriate prompts can easily cause models to hallucinate.
Human oversight alone, long seen as a safeguard against AI mistakes, may not be enough to address these problems. Research shows that users often have difficulty correcting incorrect model output even in relatively simple domains, so relying on human judgment as a fail-safe may not be the ultimate solution for proper model training. “Users can recognize these high-difficulty cases, but still frequently make poor supervision errors,” the researchers observed.
The findings call into question the current trajectory of AI development. While the push for bigger, more capable models continues, this research suggests that bigger is not always better when it comes to AI reliability.
And now companies are focusing on data quality over quantity. For example, Meta’s latest Llama 3.2 model achieves better results than previous generations trained on more parameters. Fortunately, this allows them to less humanSo when you ask them to look stupid for the most basic thing in the world, they may admit defeat.
generally intelligent newsletter
A weekly AI journey explained by Gen, a generative AI model.