It is important for LLM text-to-text generation to respond with factually consistent data in relation to the source text.
Thus establishing alignment between truth and user expectation & instruction.
The study identifies a number of factors which causes hallucination. These include:This phenomenon poses a significant challenge to the reliability of LLMs in real-world applications.
CoNLI is used for both hallucination detection and hallucination reduction via post-editing.
ConNLI detects hallucination and enhances text quality through rewriting the response.
One of the objectives of CoNLI was to only rely on LLMs without any fine-tuning or domain-specific prompt engineering.
CoNLI is described as a simple plug-and-play framework which can serve as an effective choice for hallucination detection and reduction, achieving competitive performance across various contexts.
I like to describe ungrounded hallucination has LLM generated responses which are succinct, highly plausible and believable, but factually incorrect. — Author
It needs to be stated that one of the objectives of the study was to achieve a reduction in hallucination in scenarios where the maker does not have full control over the LLM model or cannot leverage additional external knowledge.
Considering the image below, it is important to note that the study’s grounding strategy is rooted in four principles. And one of those principles is retrieving a contextual corpus of reference data.
Considering the image below, the proposed framework of CoNLI is shown with a real example.
Each hypothesis in the raw response will first go through sentence-level detection.
If no hallucination is detected, it will go to detailed entity-level detection.
Detection reasonings will be used as mitigation instructions.
This is the second study, in a short period of time where sentence-level and entity-level detection are used to check generated text coherence and truthfulness.
Making use of entity detection is an effective approach as the range of standard named entities is wide and entity detection can be performed out of the box.
CoNLI was tested on text abstractive summarisation and grounded question-answering making use of both synthetic-generated and human-annotated data.
The study considered text-to-text datasets, which is the most prevalent use-case for realtime and off-line processing of especially unstructured data and conversations.
Some key considerations from the study…
Previously published on Medium.