By using AI accelerated human annotation this framework removes uncertainty and introduces reliability via a Chain-Of-Thought baseline.

Introduction

There are emergent abilities being discovered concerning Large Language Models (LLMs), these abilities are in essence different ways in-context learning (ICL) is leveraged within LLMs.

LLMs excel at ICL and it has been shown in the recent past that with effective task-specific prompt engineering LLMs can produce high-quality answers.

Question and answer tasks with example-based prompting and CoT reasoning is particularly effective.

The concern of a recent study is that these prompt examples might be too rigid and fixed for different tasks.

This approach follows a methodology which is gaining popularity in recent studies and developments. This procedure includes the following elements:

Human annotated and vetted data.
Leveraging one or more LLMs for intermediate tasks prior to the final inference.
Adding flexibility to the prompt engineering process demand the introduction of complexity and some sort of framework which manages the process.

The Four Steps Of Active Prompting

Uncertainty Estimation
Selection
Annotation
Final Inference

Uncertainty Estimation

The LLM is queried a predefined number of times, to generate possible answers for a set of training questions. These answer and question sets are generated in a decomposed fashion with intermediate steps.

An uncertainty calculator is used based on the answers via an uncertainty metric.

Selection

Ranked according to the uncertainty, the most uncertain questions are selected for human inspection and annotation.

Annotation

Human annotators are used to annotate the selected uncertain questions.

Final Inference

Final inference for each question is performed with the newly annotated exemplars.

Considerations

There will be an increase in inference cost and token use when employing this approach; seeing a LLM is leveraged for the uncertainty calculation.
This study again illustrates the importance of a data centric approach to applied AI with a RLHF approach, where training data is created via human supervised process which is AI accelerated.
This is a good example where AI is used to accelerate the human annotation process by removing noise from the data and creating training data with a clear signal.
This approach is moving us towards a data focussed way-of-work with data discovery, data design and data development.
As this type of implementation scales, a data productivity studio or latent space will be required to streamline the data discovery, design and development process.

Contribution

The contribution of the study is three-fold:

A very judicious process of identifying the most valuable and relevant information sets (question and answer) for annotation. Doing this while reducing the human data-related workload via a AI accelerated process.
The establishment of an effective set of uncertainty metrics.
The proposed method surpasses competitive baseline models by a large margin on multiple reasoning tasks.

Baselines

It is interesting to note that this approach has four methods serving as its main baselines:

Chain-of-thought (CoT) (Wei et al., 2022b): This method employs standard chain-of-thought prompting and offers four to eight human-written exemplars, comprising a sequence of intermediate reasoning steps.
Self-Consistency (SC) (Wang et al., 2022c): An enhanced version of CoT, SC diverges from greedy decoding. Instead, it samples a set of reasoning paths and selects the most prevalent answer.
Auto-CoT (Zhang et al., 2022b): This technique involves an automatic exemplar construction approach that clusters and generates rationales using zero-shot prompting (Kojima et al., 2022).
Random-CoT: Serving as a baseline for Active-Prompt, Random-CoT follows the same annotation process. The only distinction lies in its utilisation of a random sampling method for questions from the training data during annotation, rather than employing the proposed uncertainty metrics.