LLMs (Large Language Models) are a type of artificial intelligence models designed to understand, generate, and interact with human language.
They are "large" because they are trained on vast amounts of text data, allowing them to capture complex language patterns and nuances.
Examples of LLMs include GPT-4 by OpenAI and BERT by Google.
These are large neural network models made up of layers of interconnected nodes.
Connections between nodes are expressed using numeric values (parameters) that represent the model's understanding of the language.
Each parameter has a weight; a numeric value given to each connection between two nodes.
Types of LLMs:
-
Representation models: encoder-only models that are used for specific tasks.
Representation models are sequence-to-value models; they take a text (sequence of tokens) as input and generate a value.
Example: "The weather today is great!" ==> "1"
-
Generative models: decoder-only models that generate text.
Generative models are sequence-to-sequence models; they take a text (sequence of tokens) as input and generate text (sequence of tokens).
They are not trained on specific tasks.
When given a text, the generative models need to understand the context and they need to be given clear instructions on the expected output.
Example: "What's 1 + 1?" ==> "The answer is 2"
The Transformer architecture is a neural network model primarily used for processing text.
It's widely adopted for various tasks such as machine translation, text summarization, and question-answering.
The transformer architecture is an encoder-decoder architecture.
The encoder-decoder models are sequence-to-sequence models.
They are pre-trained using masked language modeling where sets of tokens are masked:
Applications of LLMs:
- text generation
- text classification
- text clustering
- semantic search
- ...