Expertise

Select Language

English

Schedule call

Expertise

What is Retrieval Augmented Generation, also known as RAG?

Retrieval Augmented Generation (RAG) is a technique to improve the accuracy and reliability of generative AI models with data from external sources.

Explanation using an illustration:

To understand the latest technique (RAG) in the field of generative AI, imagine a Dutch court.

Judges make decisions based on their general understanding of the law. Sometimes, a case, such as a labor dispute or a tax case, requires specific expertise. Therefore, judges ask their secretary to look up case law and specific cases in a legal library to support their judgment.

Just like a good judge, large language models (LLMs) can respond to a wide range of human questions. But to provide in-depth answers that cite sources, the model needs an assistant that does research.

In the AI world, this assistant is a process called retrieval-augmented generation, or RAG.

What is RAG in the context of generative AI?

RAG, or retrieval augmented generation, is a technique that improves the accuracy and reliability of generative AI models by adding relevant information from external sources.

Generative AI models, such as large language models (LLMs), are essentially neural networks trained on vast amounts of text. This gives them a general understanding of language and enables them to respond to a wide range of questions.

By combining the best of both worlds - the general language skills of LLMs and the specific knowledge from external sources - RAG systems can generate more accurate and reliable answers to complex questions.

Combining internal and external knowledge

RAG allows AI models to cite sources, much like footnotes in a research paper. This gives users confidence, as they can verify the model's claims. This technique also helps clarify ambiguities in user questions and reduces the chances of a model producing incorrect output, which is referred to as 'hallucination.'

With RAG, you do not need to retrain a model with additional datasets, making it a relatively cost-effective solution. Moreover, it allows users to quickly and easily switch between different sources.

The history of RAG

The history of retrieval-augmented generation (RAG) dates back to the early 1970s. Researchers in information retrieval developed the first question-answer systems, applications that used natural language processing (NLP) to access text.

Over the years, the concepts behind this form of text mining remained fairly constant, but the machine learning engines driving them have grown significantly, enhancing their usability and popularity.

In the mid-1990s, the Ask Jeeves service (in the US), now Ask.com, popularised answering questions with the mascot of a well-dressed butler. Today, LLMs are taking these systems to a whole new level.

Ask Jeeves, een vroege RAG-achtige webservice

Image of Ask Jeeves, an early RAG-like web service

How does Retrieval Augmented Generation work?

RAG is a complex process that is widely discussed within the spectrum of generative AI. But how does it actually work, and what is it precisely intended for? We express it in three steps: Identify, Retrieve, and Generate.

Simpele uitleg RAG (Retrieval Augmented Generation)

Identify: All external documents are identified so that they are easily accessible to the AI language model.
Retrieve: Based on the question, the AI language model retrieves the correct documents identified based on specific questions or keywords.
Generate: The AI language model combines the information from the retrieved documents with its internal knowledge to generate an accurate answer.

When users ask a question to an AI model, the question is first transformed into a numerical form, also known as an 'embedding' or 'vector.' This is similar to translating the question into a language that machines can understand.

Then, the model searches a large collection of information (a database) for pieces of text that resemble the numerical form of the question. This database has also been converted into numerical values, allowing the model to quickly and efficiently search for relevant information.

If the model finds one or more relevant pieces of text, it retrieves them and combines this with its own knowledge to formulate a suitable answer. The result is an answer to the user's question, possibly supplemented with references to the information found in the database.

How can you safely implement RAG?

Microsoft Azure provides security measures that ensure your data remains protected at every stage of the RAG process. This environment offers the same data security expected from applications like Outlook or Word, a standard in Dutch business. Besides Microsoft Azure, it is also possible to use Pinecone or PG Vector as a vector database. These tools are more accessible than Azure but considerably less secure and reliable.

Many developers find LangChain, an open-source library, particularly useful for connecting LLMs, embedding models, and knowledge databases. The LangChain community offers its own description of a RAG process.

Finally

RAG presents enormous opportunities for companies to improve their customer service, internal knowledge sharing, and decision-making. By combining powerful language models with company-specific information, organizations can create 'Assistants' capable of providing in-depth, accurate answers to complex questions.

For companies considering implementing RAG, it is essential to invest in a secure and reliable infrastructure, such as Microsoft Azure. Additionally, it is advisable to experiment with different combinations of language models and knowledge databases to achieve the best results for your specific use case.

The possibilities of RAG are endless, and we are just at the beginning of this exciting era of AI-driven developments. Companies that take the step now and embrace RAG will have a significant advantage over their competitors and be better positioned to meet the ever-increasing expectations of customers and employees.

Silas Muiderman

Co-founder

MSTR

Expertise

Use Cases

Blogs

Careers

MSTR

Blogs

MSTR

Expertise

Use Cases

Blogs

Careers

What is Retrieval Augmented Generation, also known as RAG?

What is Retrieval Augmented Generation, also known as RAG?

What is Retrieval Augmented Generation, also known as RAG?

Explanation using an illustration:

What is RAG in the context of generative AI?

Combining internal and external knowledge

The history of RAG

How does Retrieval Augmented Generation work?

How can you safely implement RAG?

Finally