Dynamic RAG

Dynamic Retrieval Augmented Generation actively decides when and what to retrieve during the text generation process.

The two key elements of Dynamic RAG are:

Identifying the optimal moment to activate the retrieval module (when to retrieve) and
Crafting the appropriate query once retrieval is triggered (what to retrieve)

The proposed framework consists of two components, RIND and QFS.

RIND

Real-time Information Needs Detection which takes into consideration:

The LLM’s uncertainty about its own generated content,
The importance of each token, and the semantic significance of each token.

QFS

For the formulation of retrieval queries, a framework QFS is created: Query Formulation based on Self-Attention. QFS reimagines query formulation by leveraging the LLM’s self-attention across the entire context.

DRAGIN

The framework is specifically designed to make decisions on when and what to retrieve, based on the LLM’s realtime information needs during the text generation process.

DRAGIN is described as a lightweight RAG framework that can be incorporated into any Transformer-based LLMs without further training, fine-tuning, or prompt engineering.

Single-Round Retrieval-Augmented LLM

Language models (LLMs) have proven highly effective across various tasks. Nevertheless, their internal knowledge is often insufficient for tasks demanding extensive knowledge.

To tackle this issue, Retrieval-Augmented Generation (RAG) strategies are frequently used to boost LLM performance. A straightforward approach involves single-round retrieval augmentation, which is the most common approach.

Multi-Round Retrieval-Augmented LLM

While single-round retrieval suffices for straightforward tasks or instances with clear user information needs, it falls short for complex tasks like long-form question answering, open-domain summarisation, and chain-of-thought reasoning.

Relying solely on the user’s initial input for retrieval may fail to encompass all the external knowledge necessary for the model’s requirements.

Consequently, researchers have initiated investigations into multi-round retrieval augmentation.

In Conclusion

As I have mentioned before, complexity is being added to LLM integrations and applications. Added to this, a multi-round querying approach is being taken.

This adds additional cost, latency and dependancy on one or more LLMs; which should encourage enterprises to use open-sourced, locally hosted LLMs.

With the increase in RAG complexity, consideration should be given to RAG Agents, or as LlamaIndex refers to it, Agentic RAG.

Find the full study here.