Hallucination (artificial intelligence)

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

First-generation Sora video of the Glenfinnan Viaduct in Scotland, incorrectly showing: a second track, trains traveling on the right instead of the left, a second chimney on its interpretation of the train The Jacobite, and inconsistent carriage lengths

The real Glenfinnan Viaduct with The Jacobite on it

In the field of artificial intelligence (AI), a hallucination or artificial hallucination (also called bullshitting,^[1]^[2] confabulation,^[3] or delusion^[4]) is a response generated by AI that contains false or misleading information presented as fact.^[5]^[6] This term draws a loose analogy with human psychology, where a hallucination typically involves false percepts. However, there is a key difference: AI hallucination is associated with erroneously constructed responses (confabulation), rather than perceptual experiences.^[6]

For example, a chatbot powered by large language models (LLMs), like ChatGPT, may embed plausible-sounding random falsehoods within its generated content. Detecting and mitigating errors and hallucinations pose significant challenges for practical deployment and reliability of LLMs in high-stakes scenarios, such as chip design, supply chain logistics, and medical diagnostics.^[7]^[8]^[9] Some software engineers and statisticians have criticized the specific term "AI hallucination" for unreasonably anthropomorphizing computers.^[10]^[11]

Term

Origin

Since the 1980s, the term "hallucination" has been used in computer vision with a positive connotation to describe the process of adding detail to an image. For example, the task of generating high-resolution face images from low-resolution inputs is called face hallucination.^[12]^[13] The first documented use of the term "hallucination" in this sense is in the PhD thesis of Eric Mjolsness in 1986.^[14] A notable work is the face hallucination algorithm by Simon Baker and Takeo Kanade published in 1999.^[15]

In 1995, Stephen Thaler demonstrated how hallucinations and phantom experiences emerge from artificial neural networks through random perturbation of their connection weights.^[16]

In the 2000s, hallucinations were described in statistical machine translation as a failure mode.^[17]

Since the 2010s, the term underwent a semantic shift to signify the generation of factually incorrect or misleading outputs by AI systems in tasks like machine translation and object detection.^[12] In 2015, hallucinations were identified in visual semantic role labeling tasks by Saurabh Gupta and Jitendra Malik.^[18] In 2017, Google researchers used the term to describe the responses generated by neural machine translation (NMT) models when they are not related to the source text,^[19] and in 2018, the term was used in computer vision to describe instances where non-existent objects are erroneously detected because of adversarial attacks.^[20]

The term "hallucinations" in AI gained wider recognition during the AI boom, alongside the rollout of widely used chatbots based on large language models (LLMs).^[21] In July 2021, Meta warned during its release of BlenderBot 2 that the system is prone to "hallucinations", which Meta defined as "confident statements that are not true".^[22]^[23] Following OpenAI's ChatGPT release in beta version in November 2022, some users complained that such chatbots often seem to pointlessly embed plausible-sounding random falsehoods within their generated content.^[24] Many news outlets, including The New York Times, started to use the term "hallucinations" to describe these models' occasionally incorrect or inconsistent responses.^[25]

Some researchers have highlighted a lack of consistency in how the term is used, but also identified several alternative terms in the literature, such as confabulations, fabrications, and factual errors.^[12]

In 2023, the Cambridge dictionary updated its definition of hallucination to include this new sense specific to the field of AI.^[26]

Definitions and alternatives

Uses, definitions and characterizations of the term "hallucination" in the context of LLMs include:

"a tendency to invent facts in moments of uncertainty" (OpenAI, May 2023)^[28]
"a model's logical mistakes" (OpenAI, May 2023)^[28]
"fabricating information entirely, but behaving as if spouting facts" (CNBC, May 2023)^[28]
"making up information" (The Verge, February 2023)^[29]
"probability distributions" (in scientific contexts)^[30]

Journalist Benj Edwards, in Ars Technica, writes that the term "hallucination" is controversial, but that some form of metaphor remains necessary; Edwards suggests "confabulation" as an analogy for processes that involve "creative gap-filling".^[3] In July 2024, a White House report on fostering public trust in AI research mentioned hallucinations only in the context of reducing them. Notably, when acknowledging David Baker's Nobel Prize-winning work with AI-generated proteins, the Nobel committee avoided the term entirely, instead referring to "imaginative protein creation".^[30]

Hicks, Humphries, and Slater, in their article in Ethics and Information Technology, argue that the output of LLMs is "bullshit" under Harry Frankfurt's definition of the term, and that the models are "in an important way indifferent to the truth of their outputs", with true statements only accidentally true, and false ones accidentally false.^[1]^: 9

Criticism

In the scientific community, some researchers avoid the term "hallucination", seeing it as potentially misleading. It has been criticized by Usama Fayyad, executive director of the Institute for Experimental Artificial Intelligence at Northeastern University, on the grounds that it misleadingly personifies large language models and is vague.^[31] Mary Shaw said, "The current fashion for calling generative AI's errors 'hallucinations' is appalling. It anthropomorphizes the software, and it spins actual errors as somehow being idiosyncratic quirks of the system even when they're objectively incorrect."^[10] In Salon, statistician Gary N. Smith argues that LLMs "do not understand what words mean" and consequently that the term "hallucination" unreasonably anthropomorphizes the machine.^[11] Some see the AI outputs not as illusory but as prospective—that is, having some chance of being true, similar to early-stage scientific conjectures. The term has also been criticized for its association with psychedelic drug experiences.^[30]

In natural language generation

A translation on the Vicuna LLM test bed of English into the constructed language Lojban, and then back into English in a new round, generates a surreal artifact from Genesis 1:6 (RSV).

In natural language generation, a hallucination is often defined as "generated content that appears factual but is ungrounded".^[32] There are different ways to categorize hallucinations. Depending on whether the output contradicts the source or cannot be verified from the source, they are divided into intrinsic and extrinsic, respectively.^[6] Depending on whether the output contradicts the prompt or not, they could be divided into closed-domain and open-domain, respectively.^[33]

Causes

There are several reasons why natural language models hallucinate:^[6]^[34]

Hallucination from data

Hallucinations can stem from incomplete, inaccurate or unrepresentative data sets.^[35] One possible cause is source-reference divergence. This divergence may occur as an artifact of heuristic data collection or due to the nature of some natural language generation tasks that inevitably contain such divergence. When a model is trained on data with source-reference (target) divergence, the model can be encouraged to generate text that is not necessarily grounded and not faithful to the provided source.^[6]

Modeling-related causes

The pre-training of generative pretrained transformers (GPT) involves predicting the next word. It incentivizes GPT models to "give a guess" about what the next word is, even when they lack information.^[36] After pre-training, though, hallucinations can be mitigated through anti-hallucination fine-tuning^[37] (such as with reinforcement learning from human feedback). Some researchers take an anthropomorphic perspective and posit that hallucinations arise from a tension between novelty and usefulness. For instance, Teresa Amabile and Pratt define human creativity as the production of novel and useful ideas.^[38] By extension, a focus on novelty in machine creativity can lead to the production of original but inaccurate responses—that is, falsehoods—whereas a focus on usefulness may result in memorized content lacking originality.^[39]

Pre-training of models on a large corpus is known to result in the model memorizing knowledge in its parameters, creating hallucinations if the system is overconfident in its knowledge. In systems such as GPT-3, an AI generates each next word based on a sequence of previous words (including the words it has itself previously generated during the same conversation), causing a cascade of possible hallucinations as the response grows longer.^[6] By 2022, newspapers such as The New York Times expressed concern that, as the adoption of bots based on large language models continued to grow, unwarranted user confidence in bot output could lead to problems.^[40]

For models that also have an encoder (unlike GPTs), errors in encoding and decoding between text and representations can cause hallucinations. When encoders learn the wrong correlations between different parts of the training data, it can result in an erroneous generation that diverges from the input. The decoder takes the encoded input from the encoder and generates the final target sequence.^[41] Two aspects of decoding contribute to hallucinations. First, decoders can attend to the wrong part of the encoded input source, leading to erroneous generation. Second, the design of the decoding strategy itself can contribute to hallucinations. A decoding strategy that improves generation diversity, such as top-k sampling, is positively correlated with increased hallucination.^[42]

Interpretability research

In 2025, interpretability research by Anthropic on the LLM Claude identified internal circuits that cause it to decline to answer questions unless it knows the answer. By default, the circuit is active and the LLM doesn't answer. When the LLM has sufficient information, these circuits are inhibited and the LLM answers the question. Hallucinations were found to occur when this inhibition happens incorrectly, such as when Claude recognizes a name but lacks sufficient information about that person, causing it to generate plausible but untrue responses.^[43]

Examples

On 15 November 2022, researchers from Meta AI published Galactica,^[44] designed to "store, combine and reason about scientific knowledge". Content generated by Galactica came with the warning: "Outputs may be unreliable! Language Models are prone to hallucinate text." In one case, when asked to draft a paper on creating avatars, Galactica cited a fictitious paper from a real author who works in the relevant area. Meta withdrew Galactica on 17 November due to offensiveness and inaccuracy.^[45] Before the cancellation, researchers were working on Galactica Instruct, which would use instruction tuning to allow the model to follow instructions to manipulate LaTeX documents on Overleaf.^[46]

OpenAI's ChatGPT, released in beta version to the public on November 30, 2022, was based on the foundation model GPT-3.5 (a revision of GPT-3). Professor Ethan Mollick of Wharton called it an "omniscient, eager-to-please intern who sometimes lies to you". Data scientist Teresa Kubacka has recounted deliberately making up the phrase "cycloidal inverted electromagnon" and testing ChatGPT by asking it about the (nonexistent) phenomenon. ChatGPT invented a plausible-sounding answer backed with plausible-looking citations that compelled her to double-check whether she had accidentally typed in the name of a real phenomenon. Other scholars such as Oren Etzioni have joined Kubacka in assessing that such software can often give "a very impressive-sounding answer that's just dead wrong".^[47]

When CNBC asked ChatGPT for the lyrics to "Ballad of Dwight Fry", ChatGPT supplied invented lyrics rather than the actual lyrics.^[48] Asked questions about the Canadian province of New Brunswick, ChatGPT got many answers right but incorrectly classified Toronto-born Samantha Bee as a "person from New Brunswick".^[49] Asked about astrophysical magnetic fields, ChatGPT incorrectly volunteered that "(strong) magnetic fields of black holes are generated by the extremely strong gravitational forces in their vicinity". (In reality, as a consequence of the no-hair theorem, a black hole without an accretion disk is believed to have no magnetic field.)^[50] Fast Company asked ChatGPT to generate a news article on Tesla's last financial quarter; ChatGPT created a coherent article, but made up the financial numbers contained within.^[51]

Other examples involve baiting ChatGPT with a false premise to see if it embellishes upon the premise. When asked about "Harold Coward's idea of dynamic canonicity", ChatGPT fabricated that Coward wrote a book titled Dynamic Canonicity: A Model for Biblical and Theological Interpretation, arguing that religious principles are actually in a constant state of change. When pressed, ChatGPT continued to insist that the book was real.^[52] Asked for proof that dinosaurs built a civilization, ChatGPT claimed there were fossil remains of dinosaur tools and stated, "Some species of dinosaurs even developed primitive forms of art, such as engravings on stones".^[53] When prompted that "Scientists have recently discovered churros, the delicious fried-dough pastries ... (are) ideal tools for home surgery", ChatGPT claimed that a "study published in the journal Science" found that the dough is pliable enough to form into surgical instruments that can get into hard-to-reach places, and that the flavor has a calming effect on patients.^[54]^[55]

By 2023, analysts considered frequent hallucination to be a major problem in LLM technology, with a Google executive identifying hallucination reduction as a "fundamental" task for ChatGPT competitor Google Gemini.^[9]^[56] A 2023 demo for Microsoft's GPT-based Bing AI appeared to contain several hallucinations that went uncaught by the presenter.^[9]

In May 2023, it was discovered that Stephen Schwartz had submitted six fake case precedents generated by ChatGPT in his brief to the Southern District of New York on Mata v. Avianca, Inc., a personal injury case against the airline Avianca. Schwartz said that he had never previously used ChatGPT, that he did not recognize the possibility that ChatGPT's output could have been fabricated, and that ChatGPT continued to assert the authenticity of the precedents after their nonexistence was discovered.^[57] In response, Brantley Starr of the Northern District of Texas banned the submission of AI-generated case filings that have not been reviewed by a human, noting that:^[58]^[59]

Generative artificial intelligence platforms in their current states are prone to hallucinations and bias. On hallucinations, they make stuff up—even quotes and citations. Another issue is reliability or bias. While attorneys swear an oath to set aside their personal prejudices, biases, and beliefs to faithfully uphold the law and represent their clients, generative artificial intelligence is the product of programming devised by humans who did not have to swear such an oath. As such, these systems hold no allegiance to any client, the rule of law, or the laws and Constitution of the United States (or, as addressed above, the truth). Unbound by any sense of duty, honor, or justice, such programs act according to computer code rather than conviction, based on programming rather than principle.

On June 23, judge P. Kevin Castel dismissed the Mata case and issued a $5,000 fine to Schwartz and another lawyer—who had both continued to stand by the fictitious precedents despite Schwartz's previous claims—for bad faith conduct. Castel characterized numerous errors and inconsistencies in the opinion summaries, describing one of the cited opinions as "gibberish" and "[bordering] on nonsensical".^[60]

In June 2023, Mark Walters, a gun rights activist and radio personality, sued OpenAI in a Georgia state court after ChatGPT mischaracterized a legal complaint in a manner alleged to be defamatory against Walters. The complaint in question was brought in May 2023 by the Second Amendment Foundation against Washington attorney general Robert W. Ferguson for allegedly violating their freedom of speech, whereas the ChatGPT-generated summary bore no resemblance and claimed that Walters was accused of embezzlement and fraud while holding a Second Amendment Foundation office post that he never held in real life. According to AI legal expert Eugene Volokh, OpenAI is likely not shielded against this claim by Section 230, because OpenAI likely "materially contributed" to the creation of the defamatory content.^[61] In May 2025, Judge Tracie Cason of Gwinnett County Superior Court ruled in favor of OpenAI. Stating that the plaintiff had not shown he was defamed, as Walters failed to show that OpenAI's statements about him were negligent or made with "actual malice".^[62]

In February 2024, Canadian airline Air Canada was ordered by the Civil Resolution Tribunal to pay damages to a customer and honor a bereavement fare policy that was hallucinated by a support chatbot, which incorrectly stated that customers could retroactively request a bereavement discount within 90 days of the date the ticket was issued (the actual policy does not allow the fare to be requested after the flight is booked). The Tribunal rejected Air Canada's defense that the chatbot was a "separate legal entity that is responsible for its own actions".^[63]^[64]

In October 2025, several hallucinations, including non-existent academic sources and a fake quote from a federal court judgement were discovered in an A$440,000 report written by Deloitte and submitted to the Australian government in July. The company later submitted a revised report with these errors removed, and will issue a partial refund to the government.^[65]^[66] The following month, in November 2025, The Independent, a news publication in Newfoundland and Labrador, Canada, discovered that Deloitte's CA$1.6 million Health Human Resources Plan for the Government of Newfoundland and Labrador commissioned in May 2025 contained at least four false citations to non-existent research papers.^[67]^[68]

In other modalities

The images above demonstrate an example of how an artificial neural network might make a false positive result in object detection. The input image is a simplified example of the training phase, using multiple images that are known to depict starfish and sea urchins, respectively. The starfish match with a ringed texture and a star outline, whereas most sea urchins match with a striped texture and oval shape. However, the instance of a ring textured sea urchin creates a weakly weighted association between them.

Subsequent run of the network on an input image (left):^[69] The network correctly detects the starfish. However, the weakly weighted association between ringed texture and sea urchin also confers a weak signal to the latter from one of two intermediate nodes. In addition, a shell that was not included in the training gives a weak signal for the oval shape, also resulting in a weak signal for the sea urchin output. These weak signals may result in a false positive result for the presence of a sea urchin although there was none in the input image. In reality, textures and outlines would not be represented by single nodes, but rather by associated weight patterns of multiple nodes.

The concept of "hallucination" is not limited to text generation, and can occur with other modalities. A confident response from any AI that seems erroneous by the training data can be labeled a hallucination.^[6]

Object detection

Various researchers cited by Wired have classified adversarial hallucinations as a high-dimensional statistical phenomenon, or have attributed hallucinations to insufficient training data. Some researchers believe that some "incorrect" AI responses classified by humans as "hallucinations" in the case of object detection may in fact be justified by the training data, or even that an AI may be giving the "correct" answer that the human reviewers are failing to see. For example, an adversarial image that looks, to a human, like an ordinary image of a dog, may in fact be seen by the AI to contain tiny patterns that (in authentic images) would only appear when viewing a cat. The AI is detecting real-world visual patterns that humans are insensitive to.^[70]

Wired noted in 2018 that, despite no recorded attacks "in the wild" (that is, outside of proof-of-concept attacks by researchers), there was "little dispute" that consumer gadgets, and systems such as automated driving, were susceptible to adversarial attacks that could cause AI to hallucinate. Examples included a stop sign rendered invisible to computer vision; an audio clip engineered to sound innocuous to humans, but that software transcribed as "evil dot com"; and an image of two men on skis, that Google Cloud Vision identified as 91% likely to be "a dog".^[20] However, these findings have been challenged by other researchers.^[71] For example, it was objected that the models can be biased towards superficial statistics, leading adversarial training to not be robust in real-world scenarios.^[71]

Text-to-audio generative AI

Text-to-audio generative AI – more narrowly known as text-to-speech (TTS) synthesis, depending on the modality – are known to produce inaccurate and unexpected results.^[72]

Text-to-image generative AI

Text-to-image models, such as Stable Diffusion, Midjourney and others, often produce inaccurate or unexpected results. For instance, Gemini depicted Nazi German soldiers as people of color,^[73] causing controversy and leading Google to pause image generation involving people in Gemini.^[74] Generative AI is also used in photo sleuthing, occasionally causing problems. Luther (2025) describes instances in which generative AI tools used in photo-sleuthing incorrectly identify individuals or fabricate historical matches when analyzing archival military images. These image-based hallucinations can lead to the spread of misinformation about historical figures, military records, and genealogical research. ^[75]

In scientific research

Problems

AI models can cause problems in the world of academic and scientific research due to their hallucinations. Specifically, models like ChatGPT have been recorded in multiple cases to cite sources for information that are either not correct or do not exist. A 2023 study conducted in the Cureus Journal of Medical Science showed that out of 178 total references cited by GPT-3, 69 returned an incorrect or nonexistent digital object identifier (DOI). An additional 28 had no known DOI nor could be located in a Google search.^[76]

Some nonexistent phrases such as "vegetative electron microscopy" have appeared in many research papers as a result of having become embedded in AI training data.^[77]

Another instance was documented by Jerome Goddard from Mississippi State University. In an experiment, ChatGPT had provided questionable information about ticks. Unsure about the validity of the response, they inquired about the source that the information had been gathered from. Upon looking at the source, it was apparent that the DOI and the names of the authors had been hallucinated. Some of the authors were contacted and confirmed that they had no knowledge of the paper's existence whatsoever.^[78] Goddard says that, "in [ChatGPT's] current state of development, physicians and biomedical researchers should NOT ask ChatGPT for sources, references, or citations on a particular topic. Or, if they do, all such references should be carefully vetted for accuracy."^[78] The use of these language models is not ready for fields of academic research and that their use should be handled carefully.^[79]

In Addition, in FAKING IT: Navigating the new era of generative AI may be the most critical challenge to democracy yet, Nina Schick argues that the persuasive quality of hallucinated content generated by AI systems poses risks to democratic institutions by enabling the spread of convincing but false narratives. She says, "The core risk to democracy is a future in which AI is used as an engine to power all information and knowledge-consequently degrading trust in the medium of digital information itself." The article emphasizes that the rapid proliferation of generative AI may challenge public trust by blurring the boundaries between verified information and synthetic misinformation. ^[80]

On top of providing incorrect or missing reference material, ChatGPT also has issues with hallucinating the contents of some reference material. A study that analyzed a total of 115 references provided by ChatGPT-3.5 documented that 47% of them were fabricated. Another 46% cited real references but extracted incorrect information from them. Only the remaining 7% of references were cited correctly and provided accurate information. ChatGPT has also been observed to "double-down" on a lot of the incorrect information. When asked about a mistake that may have been hallucinated, sometimes ChatGPT will try to correct itself but other times it will claim the response is correct and provide even more misleading information.^[81]

These hallucinated articles generated by language models also pose an issue because it is difficult to tell whether an article was generated by an AI. To show this, a group of researchers at the Northwestern University of Chicago generated 50 abstracts based on existing reports and analyzed their originality. Plagiarism detectors gave the generated articles an originality score of 100%, meaning that the information presented appears to be completely original. Other software designed to detect AI generated text was only able to correctly identify these generated articles with an accuracy of 66%. Research scientists had a similar rate of human error, identifying these abstracts at a rate of 68%.^[82] From this information, the authors of this study concluded, "[t]he ethical and acceptable boundaries of ChatGPT's use in scientific writing remain unclear, although some publishers are beginning to lay down policies."^[83] Because of AI's ability to fabricate research undetected, the use of AI in the field of research will make determining the originality of research more difficult and require new policies regulating its use in the future.

Given the ability of AI generated language to pass as real scientific research in some cases, AI hallucinations present problems for the application of language models in the academic and scientific fields of research due to their ability to be undetectable when presented to real researchers. The high likelihood of returning non-existent reference material and incorrect information may require limitations to be put in place regarding these language models. Some say that rather than hallucinations, these events are more akin to "fabrications" and "falsifications" and that the use of these language models presents a risk to the integrity of the field as a whole.^[84]

Some academic professionals who support scholarly research, such as academic librarians, have observed a significant increase in workload related to verifying the accuracy of references.^[85] Zoë Teel noted in a 2023 paper that universities may need to resort to implementing their own citation auditing in order to track the problem of fictitious references.^[86]

Benefits

Scientists have also found that hallucinations can serve as a valuable tool for scientific discovery, particularly in fields requiring innovative approaches to complex problems. At the University of Washington, David Baker's lab has used AI hallucinations to design "ten million brand-new" proteins that don't occur in nature, leading to roughly 100 patents and the founding of over 20 biotech companies. This work contributed to Baker receiving the 2024 Nobel Prize in Chemistry, although the committee avoided using the "hallucinations" language.^[30]

In medical research and device development, hallucinations have enabled practical innovations. At California Institute of Technology, researchers used hallucinations to design a novel catheter geometry that significantly reduces bacterial contamination. The design features sawtooth-like spikes on the inner walls that prevent bacteria from gaining traction, potentially addressing a global health issue that causes millions of urinary tract infections annually. These scientific applications of hallucinations differ fundamentally from chatbot hallucinations, as they are grounded in physical reality and scientific facts rather than ambiguous language or internet data. Anima Anandkumar, a professor at Caltech, emphasizes that these AI models are "taught physics" and their outputs must be validated through rigorous testing. In meteorology, scientists use AI to generate thousands of subtle forecast variations, helping identify unexpected factors that can influence extreme weather events.^[30]

At Memorial Sloan Kettering Cancer Center, researchers have applied hallucinatory techniques to enhance blurry medical images, while the University of Texas at Austin has utilized them to improve robot navigation systems. These applications demonstrate how hallucinations, when properly constrained by scientific methodology, can accelerate the discovery process from years to days or even minutes.^[30]

Consequences of hallucinations in education

Artificial intelligence hallucinations can impact the education industry. There has been a rise in students using AI tools for assistance in research or writing tools such as Grammarly or using generative AI programs such as ChatGPT. This has raised concern regarding academic integrity^[87] in submitted projects in addition to hallucinations causing students to learn incorrect information.

Part of this concern lies in the citations provided by LLMs and generative AI. A 2024 study at the University of Mississippi found that many of these citations that students submitted were partially or completely fabricated. 47% of these sources either had incorrect titles, dates, authors, or a combination of all.^[88] The study notes that these inconsistencies in student-submitted citations cause educators and librarians to manually check the accuracy more frequently.

The Journal of Cranio-Maxillofacial Surgery addresses this risk when it comes to the medical and surgical fields. They mention how academic publishers have acknowledged the issue, and that some journals such as JAMA have changed some of their policies to discourage the use of AI-generated citations.^[89] Although the journal states that a policy will not be enough to diminish the use of AI-generated citations, automatic tools to check the citations and AI literacy training should also be adopted.

Instructors may use tools such as Turnitin for plagiarism checking and verification of academic integrity. These tools have been found to sometimes flag papers that have not used any AI assistance in their writing.^[90] OpenAI has also found such a lack of accuracy in their own AI detection software that the company shut it down entirely.^[91]

Mitigation methods

The hallucination phenomenon is still not completely understood. Researchers have proposed that hallucinations are inevitable and are an innate limitation of large language models.^[92] Therefore, there is still ongoing research to try to mitigate its occurrence.^[93] Particularly, it was shown that language models not only hallucinate but also amplify hallucinations, even for those which were designed to alleviate this issue.^[94] Researchers from OpenAI wrote that hallucinations occur because the training and evaluation of LLMs reward guessing over acknowledging uncertainty, and proposed modifying the scoring of benchmarks.^[36]

Ji et al. divide common mitigation methods into two categories: data-related methods and modeling and inference methods.^[6] Data-related methods include building a faithful dataset, cleaning data automatically, and information augmentation by augmenting the inputs with external information. Model and inference methods include changes in the architecture (either modifying the encoder, attention, or the decoder in various ways); changes in the training process, such as using reinforcement learning; and post-processing methods that can correct hallucinations in the output.

Researchers have proposed a variety of mitigation measures, including getting different chatbots to debate one another until they reach consensus on an answer.^[95] Another approach proposes to actively validate the correctness corresponding to the low-confidence generation of the model using web search results. They have shown that a generated sentence is hallucinated more often when the model has already hallucinated in its previously generated sentences for the input, and they are instructing the model to create a validation question checking the correctness of the information about the selected concept using Bing search API.^[96] An extra layer of logic-based rules was proposed for the web search mitigation method, by using different ranks of web pages as a knowledge base, which differ in hierarchy.^[97] When there are no external data sources available to validate LLM-generated responses (or the responses are already based on external data as in RAG), model uncertainty estimation techniques from machine learning may be applied to detect hallucinations.^[98] Another proposal includes a two-phase framework that detects hallucinations in LLM-generated content via unsupervised screening and LLM validation.^[99]

According to Luo et al.,^[100] the previous methods fall into knowledge- and retrieval-based approaches, which ground LLM responses in factual data using external knowledge sources, such as path grounding.^[101] Luo et al. also mention training or reference guiding for language models, involving strategies like employing control codes^[102] or contrastive learning^[103] to guide the generation process to differentiate between correct and hallucinated content. Another category is evaluation and mitigation focused on specific hallucination types,^[100] such as employing methods to evaluate quantity entity in summarization^[104] and methods to detect and mitigate self-contradictory statements.^[105]

Nvidia Guardrails, launched in 2023, can be configured to hard-code certain responses via script instead of leaving them to the LLM.^[106] Furthermore, numerous tools like SelfCheckGPT,^[107] the Trustworthy Language Model,^[108] and Aimon^[109] have emerged to aid in the detection of hallucination in offline experimentation and real-time production scenarios.

Evaluating multiple possible replies before answering a query by assigning confidence scores to each could mitigate the problem. However, this approach would multiply computational costs. Active learning would further increase these costs. In high-stakes domains such as chip design, supply chain logistics, and medical diagnostics, the added costs are operationally necessary and therefore economically viable. In chatbots, however, customers tend to prefer rapid, overconfident answers over cautious, uncertainty-aware ones.^[110]

References

^ ^a ^b Hicks, Michael Townsen; Humphries, James; Slater, Joe (June 2024). "ChatGPT is bullshit" (PDF). Ethics and Information Technology. 26 (2) 38. doi:10.1007/s10676-024-09775-5.
^ Liang, Kaiqu; Hu, Haimin; Zhao, Xuandong; Song, Dawn; Griffiths, Thomas L.; Fernández Fisac, Jaime (2025). "Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models". arXiv:2507.07484 [cs.CL].
^ ^a ^b Edwards, Benj (6 April 2023). "Why ChatGPT and Bing Chat are so good at making things up". Ars Technica. Archived from the original on 11 June 2023. Retrieved 11 June 2023.
^ Ortega, Pedro A.; Kunesch, Markus; Delétang, Grégoire; Genewein, Tim; Grau-Moya, Jordi; Veness, Joel; Buchli, Jonas; Degrave, Jonas; Piot, Bilal; Perolat, Julien; Everitt, Tom; Tallec, Corentin; Parisotto, Emilio; Erez, Tom; Chen, Yutian; Reed, Scott; Hutter, Marcus; Nando de Freitas; Legg, Shane (2021). Shaking the foundations: Delusions in sequence models for interaction and control (Preprint). arXiv:2110.10819.
^ Maynez, Joshua; Narayan, Shashi; Bohnet, Bernd; McDonald, Ryan (2020). "On Faithfulness and Factuality in Abstractive Summarization". Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 1906–1919. doi:10.18653/v1/2020.acl-main.173.
^ ^a ^b ^c ^d ^e ^f ^g ^h Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Ye Jin; Madotto, Andrea; Fung, Pascale (31 December 2023). "Survey of Hallucination in Natural Language Generation". ACM Computing Surveys. 55 (12): 1–38. arXiv:2202.03629. doi:10.1145/3571730.
^ Metz, Cade (6 November 2023). "Chatbots May 'Hallucinate' More Often Than Many Realize". The New York Times. Archived from the original on 7 December 2023. Retrieved 6 November 2023.
^ de Wynter, Adrian; Wang, Xun; Sokolov, Alex; Gu, Qilong; Chen, Si-Qing (September 2023). "An evaluation on large language model outputs: Discourse and memorization". Natural Language Processing Journal. 4 100024. arXiv:2304.08637. doi:10.1016/j.nlp.2023.100024.
^ ^a ^b ^c Leswing, Kif (14 February 2023). "Microsoft's Bing A.I. made several factual errors in last week's launch demo". CNBC. Archived from the original on 16 February 2023. Retrieved 16 February 2023.
^ ^a ^b Kang, Eunsuk; Shaw, Mary (2024). "tl;dr: Chill, y'all: AI Will Not Devour SE". Proceedings of the 2024 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. pp. 303–315. arXiv:2409.00764. doi:10.1145/3689492.3689816. ISBN 979-8-4007-1215-9.
^ ^a ^b Desai, Rajiv (13 October 2023). "Is artificial intelligence (AI) an existential threat? – Dr Rajiv Desai". Retrieved 25 November 2025.
^ ^a ^b ^c Maleki, Negar; Padmanabhan, Balaji; Dutta, Kaushik (2024). "AI Hallucinations: A Misnomer Worth Clarifying". 2024 IEEE Conference on Artificial Intelligence (CAI). pp. 133–138. arXiv:2401.06796. doi:10.1109/CAI59869.2024.00033. ISBN 979-8-3503-5409-6.
^ Liu, Ce; Shum, Heung-Yeung; Freeman, William T. (18 July 2007). "Face Hallucination: Theory and Practice". International Journal of Computer Vision. 75 (1): 115–134. doi:10.1007/s11263-006-0029-5. ProQuest 1113669475.
^ Nasrollahi, Kamal; Moeslund, Thomas B. (2014). "Super-resolution: a comprehensive survey". Machine Vision and Applications. 25 (6): 1423–1468. doi:10.1007/s00138-014-0623-4. ISSN 0932-8092.
^ "Hallucinating Faces". Robotics Institute Carnegie Mellon University. 1999.
^
- Thaler, S.L. (January 1995). "'Virtual input' phenomena within the death of a simple pattern associator". Neural Networks. 8 (1): 55–65. doi:10.1016/0893-6080(94)00065-T.
- Ricciardiello, Luciana; Fornaro, Pantaleo (May 2013). "Beyond the cliff of creativity". Medical Hypotheses. 80 (5): 534–543. doi:10.1016/j.mehy.2012.12.018. PMID 23452643.
- Thaler, S. L. (2016). "Cycles of insanity and creativity within contemplative neural systems". Medical Hypotheses. 96: 34–43. doi:10.1016/j.mehy.2016.07.010. PMID 27515220.
- Thaler, Stephen L. (2014). "Synaptic Perturbation and Consciousness". International Journal of Machine Consciousness. 6 (2). World Scientific Publishing Company: 75–107. doi:10.1142/S1793843014400137.
- Thaler, S. L. (Fall 1996). "The Death Dream and Near-Death Darwinism". Journal of Near-Death Studies. 15 (1).
^ Yonggang Deng; Byrne, W. (2008). "HMM Word and Phrase Alignment for Statistical Machine Translation". IEEE Transactions on Audio, Speech, and Language Processing. 16 (3): 494–507. doi:10.1109/TASL.2008.916056. ISSN 1558-7916.
^ Gupta, Saurabh; Malik, Jitendra (17 May 2015), Visual Semantic Role Labeling, arXiv, doi:10.48550/arXiv.1505.04474, arXiv:1505.04474
^ "Hallucinations in Neural Machine Translation". research.google. Archived from the original on 2 April 2024. Retrieved 2 April 2024.
^ ^a ^b Simonite, Tom (9 March 2018). "AI Has a Hallucination Problem That's Proving Tough to Fix". Wired. Condé Nast. Archived from the original on 5 April 2023. Retrieved 29 December 2022.
^ Zhuo, Terry Yue; Huang, Yujin; Chen, Chunyang; Xing, Zhenchang (2023). "Exploring AI Ethics of ChatGPT: A Diagnostic Analysis". arXiv:2301.12867 [cs.CL].
^ "Blender Bot 2.0: An open source chatbot that builds long-term memory and searches the internet". ai.meta.com. Retrieved 2 March 2024.
^ Tung, Liam (8 August 2022). "Meta warns its new chatbot may forget that it's a bot". ZDNET. Archived from the original on 26 March 2023. Retrieved 30 December 2022.
^ Seife, Charles (13 December 2022). "The Alarming Deceptions at the Heart of an Astounding New Chatbot". Slate. Archived from the original on 26 March 2023. Retrieved 16 February 2023.
^ Weise, Karen; Metz, Cade (1 May 2023). "When A.I. Chatbots Hallucinate". The New York Times. Archived from the original on 4 April 2024. Retrieved 8 May 2023.
^ Creamer, Ella (15 November 2023). "'Hallucinate' chosen as Cambridge dictionary's word of the year". The Guardian. Retrieved 7 June 2024.
^ "Joaquín Correa fue presentado en Botafogo para jugar el Mundial de Clubes y tuvo un insólito cruce con un periodista: 'No es mi hermano'". Clarin. 14 June 2025. Retrieved 10 August 2025.
^ ^a ^b ^c Field, Hayden (31 May 2023). "OpenAI is pursuing a new way to fight A.I. 'hallucinations'". CNBC. Archived from the original on 10 June 2023. Retrieved 11 June 2023.
^ Vincent, James (8 February 2023). "Google's AI chatbot Bard makes factual error in first demo". The Verge. Archived from the original on 12 February 2023. Retrieved 11 June 2023.
^ ^a ^b ^c ^d ^e ^f Broad, William J. (23 December 2024). "How Hallucinatory A.I. Helps Science Dream Up Big Breakthroughs". The New York Times.
^ Stening, Tanner (10 November 2023). "What are AI chatbots actually doing when they 'hallucinate'? Here's why experts don't like the term". Northeastern Global News. Retrieved 14 June 2024.
^ Tonmoy, S. M. Towhidul Islam; Zaman, S. M. Mehedi; Jain, Vinija; Rani, Anku; Rawte, Vipula; Chadha, Aman; Das, Amitava (8 January 2024). "A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models". arXiv:2401.01313 [cs.CL].
^ OpenAI (2023). "GPT-4 Technical Report". arXiv:2303.08774 [cs.CL].
^ Barassi, Veronica (2024). "Toward a Theory of AI Errors: Making Sense of Hallucinations, Catastrophic Failures, and the Fallacy of Generative AI". Harvard Data Science Review. 5. Bibcode:2024HDSRv...5ebbd4B. doi:10.1162/99608f92.ad8ebbd4.
^ Aaronson, Susan Ariel (2024). Introduction: What Hath Generative Artificial Intelligence Wrought? (Report). Centre for International Governance Innovation. pp. 1–4.
^ ^a ^b Varanasi, Lakshmi. "Why AI chatbots hallucinate, according to OpenAI researchers". Business Insider. Retrieved 28 September 2025.
^ "Tracing the thoughts of a large language model". Anthropic. 27 March 2025. Retrieved 29 March 2025.
^ Amabile, Teresa M.; Pratt, Michael G. (2016). "The dynamic componential model of creativity and innovation in organizations: Making progress, making meaning". Research in Organizational Behavior. 36: 157–183. doi:10.1016/j.riob.2016.10.001.
^ Mukherjee, Anirban; Chang, Hannah H. (2023). "Managing the Creative Frontier of Generative AI: The Novelty-Usefulness Tradeoff". California Management Review. Archived from the original on 5 January 2024. Retrieved 5 January 2024.
^ Metz, Cade (10 December 2022). "The New Chatbots Could Change the World. Can You Trust Them?". The New York Times. Archived from the original on 17 January 2023. Retrieved 30 December 2022.
^ Kocmi, Tom; Federmann, Christian (31 May 2023), Large Language Models Are State-of-the-Art Evaluators of Translation Quality, arXiv, doi:10.48550/arXiv.2302.14520, arXiv:2302.14520, retrieved 11 December 2025
^ Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Chen, Delong; Chan, Ho Shu; Dai, Wenliang; Madotto, Andrea; Fung, Pascale (19 February 2024). "Survey of Hallucination in Natural Language Generation". arxiv.org. Retrieved 25 November 2025.
^ Nuñez, Michael (27 March 2025). "Anthropic scientists expose how AI actually 'thinks' — and discover it secretly plans ahead and sometimes lies". VentureBeat. Archived from the original on 28 March 2025. Retrieved 30 March 2025.
^ Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert (16 November 2022). "Galactica: A Large Language Model for Science". arXiv:2211.09085 [cs.CL].
^ Edwards, Benj (18 November 2022). "New Meta AI demo writes racist and inaccurate scientific literature, gets pulled". Ars Technica. Archived from the original on 10 April 2023. Retrieved 30 December 2022.
^ Scialom, Thomas (23 July 2024). "Llama 2, 3 & 4: Synthetic Data, RLHF, Agents on the path to Open Source AGI". Latent Space (Interview). Interviewed by swyx & Alessio. Archived from the original on 24 July 2024.
^ Bowman, Emma (19 December 2022). "A new AI chatbot might do your homework for you. But it's still not an A+ student". NPR. Archived from the original on 20 January 2023. Retrieved 29 December 2022.
^ Pitt, Sofia (15 December 2022). "Google vs. ChatGPT: Here's what happened when I swapped services for a day". CNBC. Archived from the original on 16 January 2023. Retrieved 30 December 2022.
^ Huizinga, Raechel (30 December 2022). "We asked an AI questions about New Brunswick. Some of the answers may surprise you". CBC News. Archived from the original on 6 January 2023. Retrieved 30 December 2022.
^ Zastrow, Mark (30 December 2022). "We Asked ChatGPT Your Questions About Astronomy. It Didn't Go so Well". Discover. Archived from the original on 26 March 2023. Retrieved 31 December 2022.
^ Lin, Connie (5 December 2022). "How to easily trick OpenAI's genius new ChatGPT". Fast Company. Archived from the original on 29 March 2023. Retrieved 6 January 2023.
^ Edwards, Benj (1 December 2022). "OpenAI invites everyone to test ChatGPT, a new AI-powered chatbot—with amusing results". Ars Technica. Archived from the original on 15 March 2023. Retrieved 29 December 2022.
^ Mollick, Ethan (14 December 2022). "ChatGPT Is a Tipping Point for AI". Harvard Business Review. Archived from the original on 11 April 2023. Retrieved 29 December 2022.
^ Kantrowitz, Alex (2 December 2022). "Finally, an A.I. Chatbot That Reliably Passes 'the Nazi Test'". Slate. Archived from the original on 17 January 2023. Retrieved 29 December 2022.
^ Marcus, Gary (2 December 2022). "How come GPT can seem so brilliant one minute and so breathtakingly dumb the next?". The Road to AI We Can Trust. Substack. Archived from the original on 30 December 2022. Retrieved 29 December 2022.
^ "Google cautions against 'hallucinating' chatbots, report says". Reuters. 11 February 2023. Archived from the original on 6 April 2023. Retrieved 16 February 2023.
^ Maruf, Ramishah (27 May 2023). "Lawyer apologizes for fake court citations from ChatGPT". CNN Business.
^ Brodkin, Jon (31 May 2023). "Federal judge: No AI in my courtroom unless a human verifies its accuracy". Ars Technica. Archived from the original on 26 June 2023. Retrieved 26 June 2023.
^ "Judge Brantley Starr". Northern District of Texas | United States District Court. Archived from the original on 26 June 2023. Retrieved 26 June 2023.
^ Brodkin, Jon (23 June 2023). "Lawyers have real bad day in court after citing fake cases made up by ChatGPT". Ars Technica. Archived from the original on 26 January 2024. Retrieved 26 June 2023.
^ Belanger, Ashley (9 June 2023). "OpenAI faces defamation suit after ChatGPT completely fabricated another lawsuit". Ars Technica. Archived from the original on 1 July 2023. Retrieved 1 July 2023.
^ Scarcella, Mike (19 May 2025). "OpenAI defeats radio host's lawsuit over allegations invented by ChatGPT". Reuters. Retrieved 23 August 2025.
^ Belanger, Ashley (16 February 2024). "Air Canada must honor refund policy invented by airline's chatbot". Ars Technica. Retrieved 22 April 2025.
^ "Air Canada responsible for errors by website chatbot after B.C. customer denied retroactive discount". vancouversun. Archived from the original on 12 March 2025. Retrieved 22 April 2025.
^ Tadros, Edmund (6 October 2025). "'Full refund': Senator slams Deloitte's 'human intelligence problem'". Australian Financial Review. Retrieved 12 October 2025.
^ Tadros, Edmund; Karp, Paul (5 October 2025). "Deloitte to refund government, admits using AI in $440k report". Australian Financial Review. Retrieved 12 October 2025.
^ Brake, Justin (22 November 2025). "Major N.L. healthcare report contains errors likely generated by A.I." The Independent. Retrieved 4 December 2025.
^ Whitten, Elizabeth (24 November 2025). "N.L. asks Deloitte to carry out review after 'incorrect' citations found in $1.6M provincial health plan". CBC News. Retrieved 4 December 2025.
^ Ferrie, C.; Kaiser, S. (2019). Neural Networks for Babies. Naperville, Illinois: Sourcebooks Jabberwocky. ISBN 978-1-4926-7120-6. OCLC 1086346753.
^ Matsakis, Louise (8 May 2019). "Artificial Intelligence May Not 'Hallucinate' After All". Wired. Archived from the original on 26 March 2023. Retrieved 29 December 2022.
^ ^a ^b Gilmer, Justin; Hendrycks, Dan (6 August 2019). "A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Example Researchers Need to Expand What is Meant by 'Robustness'". Distill. 4 (8). doi:10.23915/distill.00019.1.
^ Zhang, Chenshuang; Zhang, Chaoning; Zheng, Sheng; Zhang, Mengchun; Qamar, Maryam; Bae, Sung-Ho; Kweon, In So (2 April 2023). "A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI". arXiv:2303.13336 [cs.SD].
^ Robertson, Adi (21 February 2024). "Google apologizes for "missing the mark" after Gemini generated racially diverse Nazis". The Verge. Archived from the original on 21 February 2024. Retrieved 14 August 2024.
^ "Gemini image generation got it wrong. We'll do better". Google. 23 February 2024. Archived from the original on 21 April 2024. Retrieved 14 August 2024.
^ Luther, Kurt (2025). "A Guide to Exploring Photo Sleuthing and Generative AI". Military Images. 43 (4 (234)): 8–11. ISSN 1040-4961.
^ Athaluri, Sai Anirudh; Manthena, Sandeep Varma; Kesapragada, V S R Krishna Manoj; Yarlagadda, Vineel; Dave, Tirth; Duddumpudi, Rama Tulasi Siri (11 April 2023). "Exploring the Boundaries of Reality: Investigating the Phenomenon of Artificial Intelligence Hallucination in Scientific Writing Through ChatGPT References". Cureus. 15 (4) e37432. doi:10.7759/cureus.37432. PMC 10173677. PMID 37182055.
^ Snoswell, Aaron J.; Witzenberger, Kevin; Masri, Rayane El (15 April 2025). "A weird phrase is plaguing scientific papers – and we traced it back to a glitch in AI training data". The Conversation.
^ ^a ^b Goddard, Jerome (November 2023). "Hallucinations in ChatGPT: A Cautionary Tale for Biomedical Researchers". The American Journal of Medicine. 136 (11): 1059–1060. doi:10.1016/j.amjmed.2023.06.012. PMID 37369274.
^ Ji, Ziwei; Yu, Tiezheng; Xu, Yan; Lee, Nayeon; Ishii, Etsuko; Fung, Pascale (2023). "Towards Mitigating LLM Hallucination via Self Reflection". Findings of the Association for Computational Linguistics: EMNLP 2023. pp. 1827–1843. doi:10.18653/v1/2023.findings-emnlp.123.
^ Schick, Nina (2023). "FAKING IT: Navigating the new era of generative AI may be the most critical challenge to democracy yet". RSA Journal. 169 (2(5593)): 40–43. ISSN 0958-0433.
^ Bhattacharyya, Mehul; Miller, Valerie M; Bhattacharyya, Debjani; Miller, Larry E (19 May 2023). "High Rates of Fabricated and Inaccurate References in ChatGPT-Generated Medical Content". Cureus. 15 (5) e39238. doi:10.7759/cureus.39238. PMC 10277170. PMID 37337480.
^ Else, Holly (19 January 2023). "Abstracts written by ChatGPT fool scientists". Nature. 613 (7944): 423. Bibcode:2023Natur.613..423E. doi:10.1038/d41586-023-00056-7. PMID 36635510.
^ Gao, Catherine A.; Howard, Frederick M.; Markov, Nikolay S.; Dyer, Emma C.; Ramesh, Siddhi; Luo, Yuan; Pearson, Alexander T. (26 April 2023). "Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers". npj Digital Medicine. 6 (1): 75. doi:10.1038/s41746-023-00819-6. PMC 10133283. PMID 37100871.
^ Emsley, Robin (19 August 2023). "ChatGPT: these are not hallucinations – they're fabrications and falsifications". Schizophrenia. 9 (1) 52. doi:10.1038/s41537-023-00379-4. PMC 10439949. PMID 37598184.
^ Giray, Louie (2 January 2024). "ChatGPT References Unveiled: Distinguishing the Reliable from the Fake". Internet Reference Services Quarterly. 28 (1): 9–18. doi:10.1080/10875301.2023.2265369. ISSN 1087-5301.
^ Teel, Zoë (Abbie); Wang, Ting; Lund, Brady (2023). "ChatGPT conundrums: Probing plagiarism and parroting problems in higher education practices". College & Research Libraries News. 84 (6). doi:10.5860/crln.84.6.205.
^ "Are Algorithmically-Generated Term Papers the Next Big Challenge to Academic Integrity? - EdSurge News". EdSurge. 12 February 2020. Retrieved 5 November 2025.
^ Watson, Alex P. (3 July 2024). "Hallucinated Citation Analysis: Delving into Student-Submitted AI-Generated Sources at the University of Mississippi". The Serials Librarian. 85 (5–6): 172–180. doi:10.1080/0361526X.2024.2433640. ISSN 0361-526X.
^ Jain, Anuj; Nimonkar, Pranali; Jadhav, Pratap (October 2025). "Citation integrity in the age of AI: evaluating the risks of reference hallucination in maxillofacial literature". Journal of Cranio-Maxillofacial Surgery. 53 (10): 1871–1872. doi:10.1016/j.jcms.2025.08.004.
^ "AI Detectors Don't Work. Here's What to Do Instead". MIT Sloan Teaching & Learning Technologies. Retrieved 5 November 2025.
^ "New AI classifier for indicating AI-written text". openai.com. 13 March 2024. Retrieved 5 November 2025.
^ Xu, Ziwei; Jain, Sanjay; Kankanhalli, Mohan (2024). "Hallucination is Inevitable: An Innate Limitation of Large Language Models". arXiv:2401.11817 [cs.CL].
^ Nie, Feng; Yao, Jin-Ge; Wang, Jinpeng; Pan, Rong; Lin, Chin-Yew (2019). "A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation". Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. pp. 2673–2679. doi:10.18653/v1/P19-1256.
^ Dziri, Nouha; Milton, Sivan; Yu, Mo; Zaiane, Osmar; Reddy, Siva (2022). "On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?". Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 5271–5285. doi:10.18653/v1/2022.naacl-main.387.
^ Vynck, Gerrit De (30 May 2023). "ChatGPT 'hallucinates.' Some researchers worry it isn't fixable". Washington Post. Archived from the original on 17 June 2023. Retrieved 31 May 2023.
^ Varshney, Neeraj; Yao, Wenling; Zhang, Hongming; Chen, Jianshu; Yu, Dong (2023). "A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation". arXiv:2307.03987 [cs.CL].
^ Šekrst, Kristina. "Unjustified untrue "beliefs": AI hallucinations and justification logics". In Grgić, Filip; Świętorzecka, Kordula; Brożek, Anna (eds.). Logic, Knowledge, and Tradition: Essays in Honor of Srecko Kovač. Retrieved 4 June 2024.
^ Chen, Jiuhai; Mueller, Jonas (2024). "Quantifying Uncertainty in Answers from any Language Model and Enhancing their Trustworthiness". arXiv:2308.16175 [cs.CL].
^ Jiang, Ling; Jiang, Keer; Chu, Xiaoyu; Gulati, Saaransh; Garg, Pulkit (2024). "Hallucination Detection in LLM-enriched Product Listings". Proceedings of the Seventh Workshop on e-Commerce and NLP (ECNLP 2024). Association for Computational Linguistics. pp. 38–48. Retrieved 5 October 2025.
^ ^a ^b Luo, Junliang; Li, Tianyu; Wu, Di; Jenkin, Michael; Liu, Steve; Dudek, Gregory (2024). "Hallucination Detection and Hallucination Mitigation: An Investigation". arXiv:2401.08358 [cs.CL].
^ Dziri, Nouha; Madotto, Andrea; Zaiane, Osmar; Bose, Avishek Joey (2021). "Neural path hunter: Reducing hallucination in dialogue systems via path grounding". arXiv:2104.08455 [cs.CL].
^ Rashkin, Hannah; Reitter, David; Tomar, Gaurav Singh; Das, Dipanjan (2021). "Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features". Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 704–718. doi:10.18653/v1/2021.acl-long.58.
^ Sun, Weiwei; Shi, Zhengliang; Gao, Shen; Ren, Pengjie; de Rijke, Maarten; Ren, Zhaochun (2022). "Contrastive Learning Reduces Hallucination in Conversations". arXiv:2212.10400 [cs.CL].
^ Zhao, Zheng; Cohen, Shay B.; Webber, Bonnie (2020). "Reducing Quantity Hallucinations in Abstractive Summarization". Findings of the Association for Computational Linguistics: EMNLP 2020. pp. 2237–2249. arXiv:2009.13312. doi:10.18653/v1/2020.findings-emnlp.203.
^ Mündler, Niels; He, Jingxuan; Jenko, Slobodan; Vechev, Martin (2023). "Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation". arXiv:2305.15852 [cs.CL].
^ Leswing, Kif (25 April 2023). "Nvidia has a new way to prevent A.I. chatbots from 'hallucinating' wrong facts". CNBC. Retrieved 15 June 2023.
^ Potsawee (9 May 2024). "potsawee/selfcheckgpt". GitHub. Archived from the original on 9 May 2024. Retrieved 9 May 2024.
^ "Chatbot answers are all made up. This new tool helps you figure out which ones to trust". MIT Technology Review. 25 April 2024. Archived from the original on 26 April 2024. Retrieved 31 December 2024.
^ "Aimon". aimonlabs. 8 May 2024. Archived from the original on 8 May 2024. Retrieved 9 May 2024.
^ Xing, Wei (12 September 2025). "Why OpenAI's solution to AI hallucinations would kill ChatGPT tomorrow". The Conversation. Retrieved 16 September 2025.

Notes

This article is a direct transclusion of the Wikipedia article and therefore may not meet the same editing standards as LIMSwiki.

[Hicks_Humphries_Slater_2024-1] Hicks, Michael Townsen; Humphries, James; Slater, Joe (June 2024). "ChatGPT is bullshit" (PDF). Ethics and Information Technology. 26 (2) 38. doi:10.1007/s10676-024-09775-5.

[2] Liang, Kaiqu; Hu, Haimin; Zhao, Xuandong; Song, Dawn; Griffiths, Thomas L.; Fernández Fisac, Jaime (2025). "Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models". arXiv:2507.07484 [cs.CL].

[ars_making_things_up-3] Edwards, Benj (6 April 2023). "Why ChatGPT and Bing Chat are so good at making things up". Ars Technica. Archived from the original on 11 June 2023. Retrieved 11 June 2023.

[4] Ortega, Pedro A.; Kunesch, Markus; Delétang, Grégoire; Genewein, Tim; Grau-Moya, Jordi; Veness, Joel; Buchli, Jonas; Degrave, Jonas; Piot, Bilal; Perolat, Julien; Everitt, Tom; Tallec, Corentin; Parisotto, Emilio; Erez, Tom; Chen, Yutian; Reed, Scott; Hutter, Marcus; Nando de Freitas; Legg, Shane (2021). Shaking the foundations: Delusions in sequence models for interaction and control (Preprint). arXiv:2110.10819.

[5] Maynez, Joshua; Narayan, Shashi; Bohnet, Bernd; McDonald, Ryan (2020). "On Faithfulness and Factuality in Abstractive Summarization". Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 1906–1919. doi:10.18653/v1/2020.acl-main.173.

[Ji_Lee_Frieske_Survey_of_Hallucination-6] ^ ^a ^b ^c ^d ^e ^f ^g ^h Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Ye Jin; Madotto, Andrea; Fung, Pascale (31 December 2023). "Survey of Hallucination in Natural Language Generation". ACM Computing Surveys. 55 (12): 1–38. arXiv:2202.03629. doi:10.1145/3571730.

[nyt-7] Metz, Cade (6 November 2023). "Chatbots May 'Hallucinate' More Often Than Many Realize". The New York Times. Archived from the original on 7 December 2023. Retrieved 6 November 2023.

[de_Wynter-2023-8] Wynter, Adrian; Wang, Xun; Sokolov, Alex; Gu, Qilong; Chen, Si-Qing (September 2023). "An evaluation on large language model outputs: Discourse and memorization". Natural Language Processing Journal. 4 100024. arXiv:2304.08637. doi:10.1016/j.nlp.2023.100024.

[cnbc_several_errors-9] Leswing, Kif (14 February 2023). "Microsoft's Bing A.I. made several factual errors in last week's launch demo". CNBC. Archived from the original on 16 February 2023. Retrieved 16 February 2023.

[sigplan-10] Kang, Eunsuk; Shaw, Mary (2024). "tl;dr: Chill, y'all: AI Will Not Devour SE". Proceedings of the 2024 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. pp. 303–315. arXiv:2409.00764. doi:10.1145/3689492.3689816. ISBN 979-8-4007-1215-9.

[salon-11] Desai, Rajiv (13 October 2023). "Is artificial intelligence (AI) an existential threat? – Dr Rajiv Desai". Retrieved 25 November 2025.

[Maleki_Padmanabhan_AI_Hallucinations-12] Maleki, Negar; Padmanabhan, Balaji; Dutta, Kaushik (2024). "AI Hallucinations: A Misnomer Worth Clarifying". 2024 IEEE Conference on Artificial Intelligence (CAI). pp. 133–138. arXiv:2401.06796. doi:10.1109/CAI59869.2024.00033. ISBN 979-8-3503-5409-6.

[13] Liu, Ce; Shum, Heung-Yeung; Freeman, William T. (18 July 2007). "Face Hallucination: Theory and Practice". International Journal of Computer Vision. 75 (1): 115–134. doi:10.1007/s11263-006-0029-5. ProQuest 1113669475.

[14] Nasrollahi, Kamal; Moeslund, Thomas B. (2014). "Super-resolution: a comprehensive survey". Machine Vision and Applications. 25 (6): 1423–1468. doi:10.1007/s00138-014-0623-4. ISSN 0932-8092.

[15] "Hallucinating Faces". Robotics Institute Carnegie Mellon University. 1999.

[16] 
Thaler, S.L. (January 1995). "'Virtual input' phenomena within the death of a simple pattern associator". Neural Networks. 8 (1): 55–65. doi:10.1016/0893-6080(94)00065-T.

Ricciardiello, Luciana; Fornaro, Pantaleo (May 2013). "Beyond the cliff of creativity". Medical Hypotheses. 80 (5): 534–543. doi:10.1016/j.mehy.2012.12.018. PMID 23452643.

Thaler, S. L. (2016). "Cycles of insanity and creativity within contemplative neural systems". Medical Hypotheses. 96: 34–43. doi:10.1016/j.mehy.2016.07.010. PMID 27515220.

Thaler, Stephen L. (2014). "Synaptic Perturbation and Consciousness". International Journal of Machine Consciousness. 6 (2). World Scientific Publishing Company: 75–107. doi:10.1142/S1793843014400137.

Thaler, S. L. (Fall 1996). "The Death Dream and Near-Death Darwinism". Journal of Near-Death Studies. 15 (1).

[17] Thaler, S.L. (January 1995). "'Virtual input' phenomena within the death of a simple pattern associator". Neural Networks. 8 (1): 55–65. doi:10.1016/0893-6080(94)00065-T.

[18] Ricciardiello, Luciana; Fornaro, Pantaleo (May 2013). "Beyond the cliff of creativity". Medical Hypotheses. 80 (5): 534–543. doi:10.1016/j.mehy.2012.12.018. PMID 23452643.

[19] Thaler, S. L. (2016). "Cycles of insanity and creativity within contemplative neural systems". Medical Hypotheses. 96: 34–43. doi:10.1016/j.mehy.2016.07.010. PMID 27515220.

[20] Thaler, Stephen L. (2014). "Synaptic Perturbation and Consciousness". International Journal of Machine Consciousness. 6 (2). World Scientific Publishing Company: 75–107. doi:10.1142/S1793843014400137.

[21] Thaler, S. L. (Fall 1996). "The Death Dream and Near-Death Darwinism". Journal of Near-Death Studies. 15 (1).

[17] Yonggang Deng; Byrne, W. (2008). "HMM Word and Phrase Alignment for Statistical Machine Translation". IEEE Transactions on Audio, Speech, and Language Processing. 16 (3): 494–507. doi:10.1109/TASL.2008.916056. ISSN 1558-7916.

[18] Gupta, Saurabh; Malik, Jitendra (17 May 2015), Visual Semantic Role Labeling, arXiv, doi:10.48550/arXiv.1505.04474, arXiv:1505.04474

[19] "Hallucinations in Neural Machine Translation". research.google. Archived from the original on 2 April 2024. Retrieved 2 April 2024.

[Simonite2018-20] Simonite, Tom (9 March 2018). "AI Has a Hallucination Problem That's Proving Tough to Fix". Wired. Condé Nast. Archived from the original on 5 April 2023. Retrieved 29 December 2022.

[21] Zhuo, Terry Yue; Huang, Yujin; Chen, Chunyang; Xing, Zhenchang (2023). "Exploring AI Ethics of ChatGPT: A Diagnostic Analysis". arXiv:2301.12867 [cs.CL].

[22] "Blender Bot 2.0: An open source chatbot that builds long-term memory and searches the internet". ai.meta.com. Retrieved 2 March 2024.

[23] Tung, Liam (8 August 2022). "Meta warns its new chatbot may forget that it's a bot". ZDNET. Archived from the original on 26 March 2023. Retrieved 30 December 2022.

[24] Seife, Charles (13 December 2022). "The Alarming Deceptions at the Heart of an Astounding New Chatbot". Slate. Archived from the original on 26 March 2023. Retrieved 16 February 2023.

[25] Weise, Karen; Metz, Cade (1 May 2023). "When A.I. Chatbots Hallucinate". The New York Times. Archived from the original on 4 April 2024. Retrieved 8 May 2023.

[26] Creamer, Ella (15 November 2023). "'Hallucinate' chosen as Cambridge dictionary's word of the year". The Guardian. Retrieved 7 June 2024.

[27] "Joaquín Correa fue presentado en Botafogo para jugar el Mundial de Clubes y tuvo un insólito cruce con un periodista: 'No es mi hermano'". Clarin. 14 June 2025. Retrieved 10 August 2025.

[cnbc_new_way-28] Field, Hayden (31 May 2023). "OpenAI is pursuing a new way to fight A.I. 'hallucinations'". CNBC. Archived from the original on 10 June 2023. Retrieved 11 June 2023.

[29] Vincent, James (8 February 2023). "Google's AI chatbot Bard makes factual error in first demo". The Verge. Archived from the original on 12 February 2023. Retrieved 11 June 2023.

[nyt-science-30] ^ ^a ^b ^c ^d ^e ^f Broad, William J. (23 December 2024). "How Hallucinatory A.I. Helps Science Dream Up Big Breakthroughs". The New York Times.

[31] Stening, Tanner (10 November 2023). "What are AI chatbots actually doing when they 'hallucinate'? Here's why experts don't like the term". Northeastern Global News. Retrieved 14 June 2024.

[32] Tonmoy, S. M. Towhidul Islam; Zaman, S. M. Mehedi; Jain, Vinija; Rani, Anku; Rawte, Vipula; Chadha, Aman; Das, Amitava (8 January 2024). "A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models". arXiv:2401.01313 [cs.CL].

[33] OpenAI (2023). "GPT-4 Technical Report". arXiv:2303.08774 [cs.CL].

[34] Barassi, Veronica (2024). "Toward a Theory of AI Errors: Making Sense of Hallucinations, Catastrophic Failures, and the Fallacy of Generative AI". Harvard Data Science Review. 5. Bibcode:2024HDSRv...5ebbd4B. doi:10.1162/99608f92.ad8ebbd4.

[35] Aaronson, Susan Ariel (2024). Introduction: What Hath Generative Artificial Intelligence Wrought? (Report). Centre for International Governance Innovation. pp. 1–4.

[:1-36] Varanasi, Lakshmi. "Why AI chatbots hallucinate, according to OpenAI researchers". Business Insider. Retrieved 28 September 2025.

[37] "Tracing the thoughts of a large language model". Anthropic. 27 March 2025. Retrieved 29 March 2025.

[38] Amabile, Teresa M.; Pratt, Michael G. (2016). "The dynamic componential model of creativity and innovation in organizations: Making progress, making meaning". Research in Organizational Behavior. 36: 157–183. doi:10.1016/j.riob.2016.10.001.

[39] Mukherjee, Anirban; Chang, Hannah H. (2023). "Managing the Creative Frontier of Generative AI: The Novelty-Usefulness Tradeoff". California Management Review. Archived from the original on 5 January 2024. Retrieved 5 January 2024.

[40] Metz, Cade (10 December 2022). "The New Chatbots Could Change the World. Can You Trust Them?". The New York Times. Archived from the original on 17 January 2023. Retrieved 30 December 2022.

[41] Kocmi, Tom; Federmann, Christian (31 May 2023), Large Language Models Are State-of-the-Art Evaluators of Translation Quality, arXiv, doi:10.48550/arXiv.2302.14520, arXiv:2302.14520, retrieved 11 December 2025

[42] Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Chen, Delong; Chan, Ho Shu; Dai, Wenliang; Madotto, Andrea; Fung, Pascale (19 February 2024). "Survey of Hallucination in Natural Language Generation". arxiv.org. Retrieved 25 November 2025.

[43] Nuñez, Michael (27 March 2025). "Anthropic scientists expose how AI actually 'thinks' — and discover it secretly plans ahead and sometimes lies". VentureBeat. Archived from the original on 28 March 2025. Retrieved 30 March 2025.

[44] Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert (16 November 2022). "Galactica: A Large Language Model for Science". arXiv:2211.09085 [cs.CL].

[45] Edwards, Benj (18 November 2022). "New Meta AI demo writes racist and inaccurate scientific literature, gets pulled". Ars Technica. Archived from the original on 10 April 2023. Retrieved 30 December 2022.

[46] Scialom, Thomas (23 July 2024). "Llama 2, 3 & 4: Synthetic Data, RLHF, Agents on the path to Open Source AGI". Latent Space (Interview). Interviewed by swyx & Alessio. Archived from the original on 24 July 2024.

[47] Bowman, Emma (19 December 2022). "A new AI chatbot might do your homework for you. But it's still not an A+ student". NPR. Archived from the original on 20 January 2023. Retrieved 29 December 2022.

[48] Pitt, Sofia (15 December 2022). "Google vs. ChatGPT: Here's what happened when I swapped services for a day". CNBC. Archived from the original on 16 January 2023. Retrieved 30 December 2022.

[49] Huizinga, Raechel (30 December 2022). "We asked an AI questions about New Brunswick. Some of the answers may surprise you". CBC News. Archived from the original on 6 January 2023. Retrieved 30 December 2022.

[50] Zastrow, Mark (30 December 2022). "We Asked ChatGPT Your Questions About Astronomy. It Didn't Go so Well". Discover. Archived from the original on 26 March 2023. Retrieved 31 December 2022.

[fast_company_2022-51] Lin, Connie (5 December 2022). "How to easily trick OpenAI's genius new ChatGPT". Fast Company. Archived from the original on 29 March 2023. Retrieved 6 January 2023.

[52] Edwards, Benj (1 December 2022). "OpenAI invites everyone to test ChatGPT, a new AI-powered chatbot—with amusing results". Ars Technica. Archived from the original on 15 March 2023. Retrieved 29 December 2022.

[53] Mollick, Ethan (14 December 2022). "ChatGPT Is a Tipping Point for AI". Harvard Business Review. Archived from the original on 11 April 2023. Retrieved 29 December 2022.

[54] Kantrowitz, Alex (2 December 2022). "Finally, an A.I. Chatbot That Reliably Passes 'the Nazi Test'". Slate. Archived from the original on 17 January 2023. Retrieved 29 December 2022.

[55] Marcus, Gary (2 December 2022). "How come GPT can seem so brilliant one minute and so breathtakingly dumb the next?". The Road to AI We Can Trust. Substack. Archived from the original on 30 December 2022. Retrieved 29 December 2022.

[56] "Google cautions against 'hallucinating' chatbots, report says". Reuters. 11 February 2023. Archived from the original on 6 April 2023. Retrieved 16 February 2023.

[57] Maruf, Ramishah (27 May 2023). "Lawyer apologizes for fake court citations from ChatGPT". CNN Business.

[58] Brodkin, Jon (31 May 2023). "Federal judge: No AI in my courtroom unless a human verifies its accuracy". Ars Technica. Archived from the original on 26 June 2023. Retrieved 26 June 2023.

[59] "Judge Brantley Starr". Northern District of Texas | United States District Court. Archived from the original on 26 June 2023. Retrieved 26 June 2023.

[60] Brodkin, Jon (23 June 2023). "Lawyers have real bad day in court after citing fake cases made up by ChatGPT". Ars Technica. Archived from the original on 26 January 2024. Retrieved 26 June 2023.

[61] Belanger, Ashley (9 June 2023). "OpenAI faces defamation suit after ChatGPT completely fabricated another lawsuit". Ars Technica. Archived from the original on 1 July 2023. Retrieved 1 July 2023.

[62] Scarcella, Mike (19 May 2025). "OpenAI defeats radio host's lawsuit over allegations invented by ChatGPT". Reuters. Retrieved 23 August 2025.

[63] Belanger, Ashley (16 February 2024). "Air Canada must honor refund policy invented by airline's chatbot". Ars Technica. Retrieved 22 April 2025.

[64] "Air Canada responsible for errors by website chatbot after B.C. customer denied retroactive discount". vancouversun. Archived from the original on 12 March 2025. Retrieved 22 April 2025.

[65] Tadros, Edmund (6 October 2025). "'Full refund': Senator slams Deloitte's 'human intelligence problem'". Australian Financial Review. Retrieved 12 October 2025.

[66] Tadros, Edmund; Karp, Paul (5 October 2025). "Deloitte to refund government, admits using AI in $440k report". Australian Financial Review. Retrieved 12 October 2025.

[67] Brake, Justin (22 November 2025). "Major N.L. healthcare report contains errors likely generated by A.I." The Independent. Retrieved 4 December 2025.

[68] Whitten, Elizabeth (24 November 2025). "N.L. asks Deloitte to carry out review after 'incorrect' citations found in $1.6M provincial health plan". CBC News. Retrieved 4 December 2025.

[69] Ferrie, C.; Kaiser, S. (2019). Neural Networks for Babies. Naperville, Illinois: Sourcebooks Jabberwocky. ISBN 978-1-4926-7120-6. OCLC 1086346753.

[70] Matsakis, Louise (8 May 2019). "Artificial Intelligence May Not 'Hallucinate' After All". Wired. Archived from the original on 26 March 2023. Retrieved 29 December 2022.

[bugs-71] Gilmer, Justin; Hendrycks, Dan (6 August 2019). "A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Example Researchers Need to Expand What is Meant by 'Robustness'". Distill. 4 (8). doi:10.23915/distill.00019.1.

[72] Zhang, Chenshuang; Zhang, Chaoning; Zheng, Sheng; Zhang, Mengchun; Qamar, Maryam; Bae, Sung-Ho; Kweon, In So (2 April 2023). "A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI". arXiv:2303.13336 [cs.SD].

[73] Robertson, Adi (21 February 2024). "Google apologizes for "missing the mark" after Gemini generated racially diverse Nazis". The Verge. Archived from the original on 21 February 2024. Retrieved 14 August 2024.

[74] "Gemini image generation got it wrong. We'll do better". Google. 23 February 2024. Archived from the original on 21 April 2024. Retrieved 14 August 2024.

[75] Luther, Kurt (2025). "A Guide to Exploring Photo Sleuthing and Generative AI". Military Images. 43 (4 (234)): 8–11. ISSN 1040-4961.

[Athaluri2023-76] Athaluri, Sai Anirudh; Manthena, Sandeep Varma; Kesapragada, V S R Krishna Manoj; Yarlagadda, Vineel; Dave, Tirth; Duddumpudi, Rama Tulasi Siri (11 April 2023). "Exploring the Boundaries of Reality: Investigating the Phenomenon of Artificial Intelligence Hallucination in Scientific Writing Through ChatGPT References". Cureus. 15 (4) e37432. doi:10.7759/cureus.37432. PMC 10173677. PMID 37182055.

[veg-77] Snoswell, Aaron J.; Witzenberger, Kevin; Masri, Rayane El (15 April 2025). "A weird phrase is plaguing scientific papers – and we traced it back to a glitch in AI training data". The Conversation.

[Goddard2023-78] Goddard, Jerome (November 2023). "Hallucinations in ChatGPT: A Cautionary Tale for Biomedical Researchers". The American Journal of Medicine. 136 (11): 1059–1060. doi:10.1016/j.amjmed.2023.06.012. PMID 37369274.

[79] Ji, Ziwei; Yu, Tiezheng; Xu, Yan; Lee, Nayeon; Ishii, Etsuko; Fung, Pascale (2023). "Towards Mitigating LLM Hallucination via Self Reflection". Findings of the Association for Computational Linguistics: EMNLP 2023. pp. 1827–1843. doi:10.18653/v1/2023.findings-emnlp.123.

[80] Schick, Nina (2023). "FAKING IT: Navigating the new era of generative AI may be the most critical challenge to democracy yet". RSA Journal. 169 (2(5593)): 40–43. ISSN 0958-0433.

[81] Bhattacharyya, Mehul; Miller, Valerie M; Bhattacharyya, Debjani; Miller, Larry E (19 May 2023). "High Rates of Fabricated and Inaccurate References in ChatGPT-Generated Medical Content". Cureus. 15 (5) e39238. doi:10.7759/cureus.39238. PMC 10277170. PMID 37337480.

[82] Else, Holly (19 January 2023). "Abstracts written by ChatGPT fool scientists". Nature. 613 (7944): 423. Bibcode:2023Natur.613..423E. doi:10.1038/d41586-023-00056-7. PMID 36635510.

[83] Gao, Catherine A.; Howard, Frederick M.; Markov, Nikolay S.; Dyer, Emma C.; Ramesh, Siddhi; Luo, Yuan; Pearson, Alexander T. (26 April 2023). "Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers". npj Digital Medicine. 6 (1): 75. doi:10.1038/s41746-023-00819-6. PMC 10133283. PMID 37100871.

[84] Emsley, Robin (19 August 2023). "ChatGPT: these are not hallucinations – they're fabrications and falsifications". Schizophrenia. 9 (1) 52. doi:10.1038/s41537-023-00379-4. PMC 10439949. PMID 37598184.

[85] Giray, Louie (2 January 2024). "ChatGPT References Unveiled: Distinguishing the Reliable from the Fake". Internet Reference Services Quarterly. 28 (1): 9–18. doi:10.1080/10875301.2023.2265369. ISSN 1087-5301.

[86] Teel, Zoë (Abbie); Wang, Ting; Lund, Brady (2023). "ChatGPT conundrums: Probing plagiarism and parroting problems in higher education practices". College & Research Libraries News. 84 (6). doi:10.5860/crln.84.6.205.

[87] "Are Algorithmically-Generated Term Papers the Next Big Challenge to Academic Integrity? - EdSurge News". EdSurge. 12 February 2020. Retrieved 5 November 2025.

[88] Watson, Alex P. (3 July 2024). "Hallucinated Citation Analysis: Delving into Student-Submitted AI-Generated Sources at the University of Mississippi". The Serials Librarian. 85 (5–6): 172–180. doi:10.1080/0361526X.2024.2433640. ISSN 0361-526X.

[89] Jain, Anuj; Nimonkar, Pranali; Jadhav, Pratap (October 2025). "Citation integrity in the age of AI: evaluating the risks of reference hallucination in maxillofacial literature". Journal of Cranio-Maxillofacial Surgery. 53 (10): 1871–1872. doi:10.1016/j.jcms.2025.08.004.

[90] "AI Detectors Don't Work. Here's What to Do Instead". MIT Sloan Teaching & Learning Technologies. Retrieved 5 November 2025.

[91] "New AI classifier for indicating AI-written text". openai.com. 13 March 2024. Retrieved 5 November 2025.

[92] Xu, Ziwei; Jain, Sanjay; Kankanhalli, Mohan (2024). "Hallucination is Inevitable: An Innate Limitation of Large Language Models". arXiv:2401.11817 [cs.CL].

[93] Nie, Feng; Yao, Jin-Ge; Wang, Jinpeng; Pan, Rong; Lin, Chin-Yew (2019). "A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation". Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. pp. 2673–2679. doi:10.18653/v1/P19-1256.

[94] Dziri, Nouha; Milton, Sivan; Yu, Mo; Zaiane, Osmar; Reddy, Siva (2022). "On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?". Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 5271–5285. doi:10.18653/v1/2022.naacl-main.387.

[95] Vynck, Gerrit De (30 May 2023). "ChatGPT 'hallucinates.' Some researchers worry it isn't fixable". Washington Post. Archived from the original on 17 June 2023. Retrieved 31 May 2023.

[96] Varshney, Neeraj; Yao, Wenling; Zhang, Hongming; Chen, Jianshu; Yu, Dong (2023). "A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation". arXiv:2307.03987 [cs.CL].

[97] Šekrst, Kristina. "Unjustified untrue "beliefs": AI hallucinations and justification logics". In Grgić, Filip; Świętorzecka, Kordula; Brożek, Anna (eds.). Logic, Knowledge, and Tradition: Essays in Honor of Srecko Kovač. Retrieved 4 June 2024.

[Chen-2024-98] Chen, Jiuhai; Mueller, Jonas (2024). "Quantifying Uncertainty in Answers from any Language Model and Enhancing their Trustworthiness". arXiv:2308.16175 [cs.CL].

[99] Jiang, Ling; Jiang, Keer; Chu, Xiaoyu; Gulati, Saaransh; Garg, Pulkit (2024). "Hallucination Detection in LLM-enriched Product Listings". Proceedings of the Seventh Workshop on e-Commerce and NLP (ECNLP 2024). Association for Computational Linguistics. pp. 38–48. Retrieved 5 October 2025.

[Luo-2024-100] Luo, Junliang; Li, Tianyu; Wu, Di; Jenkin, Michael; Liu, Steve; Dudek, Gregory (2024). "Hallucination Detection and Hallucination Mitigation: An Investigation". arXiv:2401.08358 [cs.CL].

[101] Dziri, Nouha; Madotto, Andrea; Zaiane, Osmar; Bose, Avishek Joey (2021). "Neural path hunter: Reducing hallucination in dialogue systems via path grounding". arXiv:2104.08455 [cs.CL].

[102] Rashkin, Hannah; Reitter, David; Tomar, Gaurav Singh; Das, Dipanjan (2021). "Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features". Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 704–718. doi:10.18653/v1/2021.acl-long.58.

[103] Sun, Weiwei; Shi, Zhengliang; Gao, Shen; Ren, Pengjie; de Rijke, Maarten; Ren, Zhaochun (2022). "Contrastive Learning Reduces Hallucination in Conversations". arXiv:2212.10400 [cs.CL].

[104] Zhao, Zheng; Cohen, Shay B.; Webber, Bonnie (2020). "Reducing Quantity Hallucinations in Abstractive Summarization". Findings of the Association for Computational Linguistics: EMNLP 2020. pp. 2237–2249. arXiv:2009.13312. doi:10.18653/v1/2020.findings-emnlp.203.

[105] Mündler, Niels; He, Jingxuan; Jenko, Slobodan; Vechev, Martin (2023). "Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation". arXiv:2305.15852 [cs.CL].

[106] Leswing, Kif (25 April 2023). "Nvidia has a new way to prevent A.I. chatbots from 'hallucinating' wrong facts". CNBC. Retrieved 15 June 2023.

[107] Potsawee (9 May 2024). "potsawee/selfcheckgpt". GitHub. Archived from the original on 9 May 2024. Retrieved 9 May 2024.

[108] "Chatbot answers are all made up. This new tool helps you figure out which ones to trust". MIT Technology Review. 25 April 2024. Archived from the original on 26 April 2024. Retrieved 31 December 2024.

[109] "Aimon". aimonlabs. 8 May 2024. Archived from the original on 8 May 2024. Retrieved 9 May 2024.

[:0-110] Xing, Wei (12 September 2025). "Why OpenAI's solution to AI hallucinations would kill ChatGPT tomorrow". The Conversation. Retrieved 16 September 2025.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94]

[95]

[96]

[97]

[98]

[99]

[100]