Understanding Hallucinations in Generative AI: A Growing Concern
Hallucinations have posed a significant challenge for generative AI systems. The very characteristics that enable their creativity in crafting text and visuals also render them susceptible to fabricating information. Alarmingly, the situation appears to be deteriorating rather than improving.
A recent technical report by OpenAI, highlighted in The New York Times, reveals that OpenAI’s newest models, o3 and o4-mini, exhibit hallucination rates of 51% and 79%, respectively, when evaluated against a benchmark known as SimpleQA. The previous o1 model had a hallucination rate of 44% in the same test.
These statistics are surprisingly alarming and indicate a troubling trend. Despite being termed as reasoning models—designed to contemplate their responses and articulate them more deliberately—these systems are seemingly increasing the likelihood of errors in their outputs.
It’s important to note that inaccuracies are not exclusive to OpenAI’s ChatGPT. During experiments with Google’s AI Overview search feature, it didn’t take long to encounter mistakes, further confirming the well-documented issues with AI in retrieving information accurately from the internet. For instance, a support bot for the AI coding application, Cursor, incorrectly announced a policy change that had not actually occurred.
Despite these discrepancies, AI companies tend to gloss over hallucinations in their product announcements. These concerns are often overshadowed by discussions regarding energy consumption and issues surrounding copyright.
In practical usage, the error rate encountered while interacting with AI tools may not approach 79%, though inaccuracies do occur. This persistent problem may remain unresolved as researchers continue to grapple with understanding why these hallucinations manifest.
Tests conducted by the AI platform developer Vectera show more encouraging results, albeit still not perfect: many models demonstrate hallucination rates between one and three percent. OpenAI’s o3 model records a 6.8% rate, while the more recent o4-mini model reports 4.6%. Although these figures align more closely with typical user experiences, even minimal hallucination rates can pose significant challenges as reliance on AI systems increases.
Identifying the Roots of Hallucinations

Credit: DailyHackly
The underlying causes of hallucinations remain elusive, with no clear solutions in sight. These AI models operate based on patterns rather than strictly adhering to predefined rules, giving them the autonomy to generate responses in unpredictable ways. Amr Awadallah, CEO of Vectara, mentioned to The New York Times that hallucinations are an inherent trait of AI systems, suggesting these issues are unlikely to be completely eradicated.
According to Hannaneh Hajishirzi, a professor at the University of Washington, who is engaged in efforts to deconstruct AI responses, the precise functioning of these models is still partially understood. It mirrors situations where troubleshooting is necessary to address issues in a car or computer; an understanding of the problem is vital for resolution.
Neil Chowdhury from AI analysis lab Transluce posits that the design of reasoning models might inadvertently exacerbate hallucination issues. He noted that “the type of reinforcement learning employed for series o models may intensify problems that conventional post-training methods typically help mitigate, but do not completely resolve,” as indicated in discussions with TechCrunch.
What do you think of the current situation?