Understanding Retrieval Augmented Generation

May 10, 2024

Ever wished you had a super-smart friend who could answer your questions, pull up information on the fly, and reply with the most recent information available in seconds? Well, good news! You can now harness that kind of power with Retrieval Augmented Generation (RAG in short).

But what exactly is RAG, and why is everyone so excited about it? Let's explore this question and find out.

What is RAG

Retrieval Augmented Generation, it is like a secret sauce that makes AI applications supercharged. Think of it as two separate systems that combine to create something extraordinary.

RAG is not limited to AI applications or Large Language Models (LLMs), it can be used independently in various use cases (more on this below). It's basically an architectural approach, which in case of LLMs improves the efficacy.

This post covers RAG and LLMs more closely, but the approach can be applied and implemented in various applications and problems.

Let's delve into more details. Imagine you are on a website chatting with a virtual assistant about last night's game. Here, RAG acts as an intermediary to provide the virtual assistant with the latest stats and information about games, so you can get more relevant answers based on your query.

In AI applications, for example, as of writing this post, the GPT-4 Turbo model of OpenAI has been trained on data up to April 2023. So if you were to ask questions that are beyond LLMs training data set, then it will simply generate gibberish in most cases, or worse, it will provide wrong information. To bridge the gap, we provide the AI Model with context via RAG.

Another way to imagine is that, if you have a PDF of 200 pages that LLM does not know anything about it, then how are you supposed to extract key information or get specific answers to your questions without going through the entire PDF? Yeah, you get it right; that's when RAG comes into the spotlight. And that's how you build chat with a PDF application.

Now, you might think, Why not simply train the LLMs with the most recent data and keep the models up to date? Well, you are right, but that's not an easy task to carry out. It needs lots of processing power, data that needs to be pre-processed and structured, and much more goes into training the model.

Working of RAG

What RAG does is, it retrieves the data from external resources, which can be databases, documents, APIs, or anything else that provides relevant information about the user's query to generate a response.

For example, if you asked about list all the team members who played in the match yesterday based on the given query, the first relevant information would be fetched from an external source. After that, both your query and the retrieved information will be passed onto the LLMs model for analysis and output.

And that's RAG in action!

Why use RAG?

There are plenty of perks to using RAG. But before that, you might have a question about what I have covered so far:

Why not provide information directly to the LLM model instead of providing it with RAG?

Okay, so the current scenario is like this: The GPT-4 Turbo by OpenAI supports the context window of 128k tokens, whereas Claude 3 by Anthropic supports 200k token window and so on, the point is these LLMs have limited context window (or input size in simple terms) which limits the context you can provide to an LLM model.

Another problem is you that you don't want to provide the whole context of data which is not relevant to the given user query, only that piece of context which is relevant to the query.

Hallucination is one of the major issues for LLMs, which means if the LLMs do not have context or pre-trained information on which the query is based, they will make a story of their own, i.e., they will generate the wrong output confidently.

And if you provide such huge context data into LLMs, it may result into decrease in your account balance very fast!

Theses are some challenges and problems RAG addresses; apart from these, the RAG architecture can be used in various applications and use cases (discussed below).

Where to Use RAG

The possibilities are endless! From healthcare to finance, RAG can revolutionize industries by providing real-time, personalized information and insights. Get ready to see it pop up in places you never expected!

In short, RAG is useful for tasks that require context awareness.

Let's explore some real-world examples of cool things we can actually do with Retrieval Augmented Generation:

  • Chatbots & Virtual Assistants: Ever had a chatbot conversation that actually felt like talking to a human? That's RAG behind the scenes. By retrieving context for a given query and generating responses that feel more personalized.
  • Information Retrieval: Due to the vast amount of data available, it's getting more and more important to retrieve the relevant information quickly. This can be very impactful for people who are working in healthcare industries or legal research by streamlining the process of obtaining relevant information and aiding professionals.
  • Question-answering System: The Q&A system is like trivia, which answers to your burning questions by retrieving context-aware information from various sources.
  • Recommendation System: The recommendation systems are everywhere, and we all have experienced them, whether they're suggesting a new show to binge-watch or which book to read next. With RAG recommendation experience can be further tailored and personalized with the most relevant piece of information
  • These are some of the partial use cases for RAG, and there is much more to it, but you get the idea how RAG can be implemented in partial scenarios.

    To conclude the whole discussion, In short, what RAG does is, it retrieves the information that is relevant to the given query; hence, you get a nice response that makes you satisfied, which is very important.