Here are a few general trends I have observed of late…
Considering the development of the Large Language Model (LLM) application ecosystem, LlamaIndex and LangChain are really at the forefront of establishing de facto application frameworks and standards.
And even if organisations don’t want to use their frameworks, studying their methods lends much insight into how the ecosystem is developing.
Here is a short list of recent market developments and shifts:
The MultiHop-RAG & implemented RAG system are publicly available.
Considering the two questions below, these are more complex questions which span over a number of companies in the first instance. And in the example for the second question, a period of time is given to which the data needs to be relevant.
Which company among Google, Apple, and Nvidia reported the largest profit margins in their third-quarter reports for 2023?
How does Apple’s sales trend look over the past three years?
From these two example questions, it is clear that elements like a knowledge base, ground-truth answers, supporting evidence, and more are required to accurately answer these questions.
These queries require evidence from multiple documents to formulate an answer. Again, this approach strongly reminds of LlamaIndex’s Agentic RAG approach.
Considering the table below, an example of a multi-hop query is given. The sources are defined, with the claim, and a bridge-topic together with a bridge-entity.
The query is shown, with the final answer.
I have often referred to inspectability and observability as one of the big advantages of a non-gradient approach like RAG. The table below is a very good case in point, where the simple answer of “yes” can be traced back.
The diagram below shows the MultiHop-RAG Pipeline, from the data collection step, to the final step of quality assurance.
The study utilised the mediastack API to download a diverse news dataset covering multiple English-language categories such as entertainment, business, sports, technology, health, and science.
To simulate real-world Retrieval-augmented generation (RAG) scenarios, the selected news articles span from September 26, 2023, to December 26, 2023, extending beyond the knowledge cutoff of widely-used LLMs like ChatGPT and LLaMA. This timeframe ensures potential divergence between the knowledge base data and the LLMs’ training data.
A trained language model was used to extract factual or opinion sentences from each news article. These factual sentences serve as evidence for addressing multi-hop queries. The selection process involves retaining articles with evidence containing overlapping keywords with other articles, facilitating the creation of multi-hop queries with answers drawn from multiple sources.
GPT-4 was used to paraphrase the evidence, which are referred to as claims, given the original evidence and its context.
The bridge-entity or bridge-topic are used to generate multi-hop queries.
To ensure dataset quality, the study employed two approaches.
The study acknowledges several limitations for potential improvement in future research.
Previously published in Medium.