Why Large Language Models Hallucinate

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I'm going to state three facts. Your challenge is to tell me how they're related; they're all space in aviation theme, but that's not it. So here we go! Number one-- the distance from the Earth to the Moon is 54 million kilometers. Number two-- before I worked at IBM, I worked at a major Australian airline. And number three-- the James Webb Telescope took the very first pictures of an exoplanet outside of our solar system. What's the common thread? Well, the answer is that all three "facts" are an example of an hallucination of a large language model, otherwise known as an LLM. Things like chatGPT and Bing chat. 54 million K, that's the distance to Mars, not the moon. It's my brother that works at the airline, not me. And infamously, at the announcement of Google's LLM, Bard, it hallucinated about the Webb telescope. The first picture of an exoplanet it was actually taken in 2004. Now, while large language models can generate fluent and coherent text on various topics and domains, they are also prone to just "make stuff up". Plausible sounding nonsense! So let's discuss, first of all, what a hallucination is. We'll discuss why they happen. And we'll take some steps to describe how you can minimize hallucinations with LLMs. Now hallucinations are outputs of LLMs that deviate from facts or contextual logic, and they can range from minor inconsistencies to completely fabricated or contradictory statements. And we can categorize hallucinations across different levels of granularity. Now, at the lowest level of granularity we could consider sentence contradiction. This is really the simplest type, and this is where an LLM generates a sentence that contradicts one of the previous sentences. So "the sky is blue today." "The sky is green today." Another example would be prompt contradiction. And this is where the generated sentence contradicts with the prompt that was used to generate it. So if I ask an LLM to write a positive review of a restaurant and its returns, "the food was terrible and the service was rude," ah, that would be in direct contradiction to what I asked. Now, we already gave some examples of another type here, which is a factual contradictions. And these factual contradictions, or factual error hallucinations, are really just that-- absolutely nailed on facts that they got wrong. Barack Obama was the first president of the United States-- something like that. And then there are also nonsensical or otherwise irrelevant kind of information based hallucinations where it just puts in something that really has no place being there. Like "The capital of France is Paris." "Paris is also the name of a famous singer." Okay, umm, thanks? Now with the question of what LLMs hallucinations are answered, we really need to answer the question of why. And it's not an easy one to answer, because the way that they derive their output is something of a black box, even to the engineers of the LLM itself. But there are a number of common causes. So let's take a look at a few of those. One of those is a data quality. Now LLMs are trained on a large corpora of text that may contain noise, errors, biases or inconsistencies. For example, some LLMs were trained by scraping all of Wikipedia and all of Reddit. It is everything on Reddit 100% accurate? Well, look, even if it was even if the training data was entirely reliable, that data may not cover all of the possible topics or domains the LLMs are expected to generate content about. So LLMs may generalize from data without being able to verify its accuracy or relevance. And sometimes it just gets it wrong. As LLM reasoning capabilities improve, hallucinations tend to decline. Now, another reason why hallucinations can happen is based upon the generation method. Now, LLMs use various methods and objectives to generate text such as beam search, sampling, maximum likelihood estimation, or reinforcement learning. And these methods and these objectives may introduce biases and tradeoffs between things like fluency and diversity, between coherence and creativity, or between accuracy and novelty. So, for example, beam search may favor high probability, but generic words over low probability, but specific words. And another common cause for hallucinations is input context. And this is one we can do something directly about as users. Now, here, context refers to the information that is given to the model as an input prompt. Context can help guide the model to produce the relevant and accurate outputs, but it can also confuse or mislead the model if it's unclear or if it's inconsistent or if it's contradictory. So, for example, if I ask an LLM chat bot, "Can cats speak English?" I would expect the answer "No, and do you need to sit down for a moment?". But perhaps I just forgotten to include a crucial little bit of information, a bit of context that this conversation thread is talking about the Garfield cartoon strip, in which case the LLM should have answered, "Yes, cats can speak English and that cat is probably going to ask for second helpings of lasagna." Context is important, and if we don't tell it we're looking for generated text suitable for an academic essay or a creative writing exercise, we can't expect it to respond within that context. Which brings us nicely to the third and final part-- what can we do to reduce hallucinations in our own conversations with LLMs? So, yep, one thing we can certainly do is provide clear and specific prompts to the system. Now, the more precise and the more detailed the input prompt, the more likely the LLM will generate relevant and, most importantly, accurate outputs. So, for example, instead of asking "What happened in World War Two?" That's not very clear. It's not very specific. We could say, "Can you summarize the major events of World War Two, including the key countries involved in the primary causes of the conflict?" Something like that that really gets at what we are trying to pull from this. That gives the model a better understanding of what information is expected in the response. We can employ something called active mitigation strategies. And what these are are using some of the settings of the LLMs, such as settings that control the parameters of how the LLM works during generation. A good example of that is the temperature parameter, which can control the randomness of the output. So a lower temperature will produce more conservative and focused responses, while a higher temperature will generate more diverse and creative ones. But the higher the temperature, the more opportunity for hallucination. And then one more is multi-shot prompting. And in contrast to single shot prompting where we only gave one prompt, multi-shot prompting provides the LLM with multiple examples of the desired output format or context, and that essentially primes the model, giving a clearer understanding of the user's expectations. By presenting the LLM with several examples, we help it recognize the pattern or the context more effectively, and this can be particularly useful in tasks that require a specific output format. So, generating code, writing poetry or answering questions in a specific style. So while large language models may sometimes hallucinate and take us on an unexpected journey, 54 million kilometers off target, understanding the causes and employing the strategies to minimize those causes really allows us to harness the true potential of these models and reduce hallucinations. Although I did kind of enjoy reading about my fictional career down under. If you have any questions, please drop us a line below. And if you want to see more videos like this in the future, please like and subscribe. Thanks for watching.
Info
Channel: IBM Technology
Views: 150,096
Rating: undefined out of 5
Keywords: IBM, IBM Cloud, Chariots, gpt
Id: cfqtFvWOfg0
Channel Id: undefined
Length: 9min 37sec (577 seconds)
Published: Thu Apr 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.