NLP1Ep2: Understanding what’s happening behind ChatGPT, Bard, HuggingChat and others – Large Language Models

Emmanuel

2 years ago

Series NLP1: Understanding what’s happening behind ChatGPT, Bard, HuggingChat and others.

Episode 2 – Emerging abilities of Large Language Models

In the preceding episode, we saw that ChatGPT and others are powered by Large Language Models (LLMs). We stated that LLMs are large-scale Pretrained Language Models (PLMs) and show some emerging abilities. So, what are those abilities?

They are the abilities that are not present in small models but arise in large models and are one of the most prominent features that distinguish LLMs from previous PLMs. Three representative abilities are: In-context learning, Instruction following, Step-by-step reasoning. Each ability will be addressed in following posts but let’s have a brief overview.

In-context learning was formally introduced by GPT-3. It’s the ability to complete tasks without requiring additional training or gradient update. It allows LLMs to learn tasks given only a few examples. During in-context learning, we give the LLM a prompt that consists of a list of input-output pairs that demonstrate a task. At the end of the prompt, we append a test input and allow the LLM to make a prediction just by conditioning on the prompt and predicting the next tokens.

Instruction following is the ability to perform well on unseen tasks that are described in the form of instructions. Thanks to this ability, LLMs can follow the task instructions for new tasks without using explicit examples. This gives them an improved generalization ability. This ability usually arises after instruction tuning process which consists of fine-tuning the LLM with a mixture of multi-task datasets formatted via natural language descriptions.

Step-by-step reasoning is the ability to solve tasks by utilizing the chain-of-thought prompting mechanism that involves intermediate reasoning steps for deriving the final answer. With chain-of-thought prompting, the language model is prompted to generate a series of short sentences that mimic the reasoning process a person might employ in solving a task.

So why are those emerging abilities a game-changer as compared to previous models in natural language processing.

Well, let’s save the question for later. That’s all for this post. Make sure to stay tuned for the next episode.

Credit: https://arxiv.org/abs/2303.18223