A study named Large Language Model Programs explores how LLMs can be implemented in applications.
The basic premise of the study is to abstract the application logic away from a prompt chaining approach and only access LLM APIs as and when required.
This approach also discards to some degree model fine-tuning in favour of in-context learning (ICL).
Although not described as RAG, the study’s approach of retrieving relevant and filtering content prior to LLM inference is analogous to RAG.
Subsequently, on a prompt level, the reasoning steps are created via decomposition, again this is very much analogous to Chain-Of-Thought.
The third step in the proposed sequence, again resembles the Self-Consistency CoT approach.
One important distinction of the LLM Programs study from previous implementations, is that one or more LLMs do not form part of a conversational UI per se, but any application, like a website for instance.
Hence the state management of the program is not managed and maintained by the LLM, but rather the LLM is only involved as needed.
The study sees one of the advantages, that only contextual information from that particular step is included in each LLM call.
There are some advantages listed in the study, even-though the principle of RAG has been proven many times over, together with the principle of decomposition coined as the Chain-Of-X Phenomenon.
Some of the key findings from the study are:
This approach does necessitate some concessions, especially on the front of flexibility.
The study suggests a framework which embeds an LLM within a program, and which includes a safety mechanism which filters out unwanted LLM responses.
Embedding an LLM within a task-independent program that is responsible to select and load relevant documents into context or which summarises past text may serve as a way of overcoming such limitations.
The study states:
It is central to this method to decompose the main problem recursively into subproblems until they can be solved by a single query to the model.
This approach will definitely require a framework of sorts to programatically perform a recursive approach, iterating over subproblems until the final answer is reached.
I read multiple papers on a daily basis; what I find interesting, is that one starts to form a general perception of what are becoming accepted norms and standards.
Especially in the case of LLMs, there is an alignment starting to solidify on what is best practice for production implementations. This becomes evident when you read older studies and find how many of the ideas raised in the past are now superseded.
Take for instance the idea of fine-tuning a model. The introduction of new concepts have lead to fine-tuning being much more feasible than only a few months ago:
Key elements for any LLM implementation are:
Find the study here.