Firstly, what is alignment? Alignment refers to ensuring models behave in accordance to what the intention of the prompt was. This comes down to the accuracy of prompt engineering. Prompts are in essence a body of text where the user defines, or rather describes, their intent. And by implication the user describes the intended outcome in the prompt.
A process of optimising prompts via an iterative process can aid in model alignment, where prompts are refined for specific models and use-cases. Hence an iterative process of convergence to an optimal prompt for a specific solution.
OpenAI devoted six months to iteratively aligning GPT-4 before its release. — Source
The image above shows the taxonomy explored in the study with seven overarching categories: reliability, safety, fairness and bias, resistance to misuse, interpretability, goodwill, and robustness.
And each major category contains several sub-categories, constituting 29 sub-categories.
In the context of LLMs, non-deterministic means that the same prompt submitted to an LLM at different times, will most probably yield different results.
In order to deal better with the non-deterministic nature of LLMs, training can be used via various avenues. The study divides training into three steps.
Step 1 — Supervised Fine-Tuning (SFT): Given a pre-trained (unaligned) LLM that is trained on a large text dataset, we first sample prompts and ask humans to write the corresponding (good) outputs based on the prompts. We then fine-tune the pre-trained LLM on the prompt and human-written outputs to obtain SFT LLM.
Step 2 — Training Reward Model: We again sample prompts, and for each prompt, we generate multiple outputs from the SFT LLM, and ask humans to rank them. Based on the ranking, we train a reward model (a model that predicts how good an LLM output is).
Step 3 — Reinforcement Learning from Human Feedback (RLHF): Given a prompt, we sample output from the SFT LLM. Then we use the trained reward model to predict the reward on the output. We then use the Reinforcement Learning (RL) algorithm to update the SFT LLM with the predicted reward.
The three steps highlighted by the study is helpful, but I still prefer the data discovery, data development and data design approach.
Data Discovery done right can aid immensely in using existing conversational data and ensuring the data which is designed, matches the desired conversations of the users.
From here via an AI accelerated latent space (data productivity platform) discovered data can be design and further developed via weak human supervision.
The study defines the current major use-cases of LLMs into the four main categories as seen in the image. The study does state that this diagram is not exhaustive, and there is scope for improvement.
Misinformation mostly refers to wrong or biased answers and can also be the result of no well-formed or sufficiently refined prompt engineering.
Hallucination may consist of fabricated contents that conflict with certain source content.
Or cannot be verified from the existing sources.
Hallucination can be mitigated by increasing training data, especially accurate contextual reference data at inference.
Or a process of ranking and reward with RLHF.
Everyone is trying to figure out how to optimise build applications using LLMs, I see this as the data delivery phase. The upcoming phase are data discovery, data design and data development.
Find the full study here.