• Home
  • Docker
  • Kubernetes
  • LLMs
  • Java
  • Ubuntu
  • Maven
  • Big Data
  • Archived
LLMs | Introduction
  1. Introduction
  2. Representation models
  3. Generative models
  4. Creating a language model

  1. Introduction
    LLMs (Large Language Models) are a type of artificial intelligence models designed to understand, generate, and interact with human language. They are "large" because they are trained on vast amounts of text data, allowing them to capture complex language patterns and nuances. Examples of LLMs include GPT-4 by OpenAI and BERT by Google.

    These are large neural network models made up of layers of interconnected nodes. Connections between nodes are expressed using numeric values ​​(parameters) that represent the model's understanding of the language. Each parameter has a weight; a numeric value given to each connection between two nodes.

    Types of LLMs:
    • Representation models: encoder-only models that are used for specific tasks.
      Representation models are sequence-to-value models; they take a text (sequence of tokens) as input and generate a value.
      Example: "The weather today is great!" ==> "1"

    • Generative models: decoder-only models that generate text.
      Generative models are sequence-to-sequence models; they take a text (sequence of tokens) as input and generate text (sequence of tokens). They are not trained on specific tasks. When given a text, the generative models need to understand the context and they need to be given clear instructions on the expected output.
      Example: "What's 1 + 1?" ==> "The answer is 2"

    The Transformer architecture is a neural network model primarily used for processing text. It's widely adopted for various tasks such as machine translation, text summarization, and question-answering.

    The transformer architecture is an encoder-decoder architecture. The encoder-decoder models are sequence-to-sequence models. They are pre-trained using masked language modeling where sets of tokens are masked:

    Applications of LLMs:
    • text generation
    • text classification
    • text clustering
    • semantic search
    • ...
  2. Representation models
    They are used for specific tasks: for example, text classification

    Example: predict masked token

    Examples of representation models:
    • Google BERT large model (open source) consists of 340 million parameters.
      BERT: Bidirectional Encoder Representations from Transformers
  3. Generative models
    The model takes an input (a.k.a. the user prompt, user query) and returns an output that's expected to follow the user prompt.

    Generative models are also called completion models (auto-complete the user prompt).

    Example: predict next token

    Example: auto completion

    Examples of representation models:
    • Open AI GPT-4 model (proprietary) consists of 1.75 trillion parameters
      GPT: Generative Pre-trained Transformer

    • Google Gemini model (proprietary) consists of 27 billion parameters

    • Meta AI LLaMA 4 model (open source) consists of 2 trillion parameters

    • Google T5 (Text To Text Transfer Transformer) is an encoder-decoder architecture with 11 billion parameters.
  4. Creating a language model
    Creating a language model takes place in two steps: Training, Fine-tuning

    Training:
    • The model is trained on a lot of data allowing it to learn the language grammar and understand the semantic, context, and patterns of text.
    • The training allow the model to predict the next token (doesn't target specific tasks).
    • The trained models (a.k.a. the pre-trained models) are called foundation models or base models.
    • The training takes a lot of computation (GPUs, VRAMs).
    • The training takes a lot of training time.
    • The training is very costly.
    • The training requires a lot of data (un-supervised).

    Fine-tuning:
    • Fine-tuning uses the pre-trained model to train it on a specific task (for example: text classification task).
    • It produces fine-tuned models.
    • It takes less of computation (GPUs, VRAMs).
    • It takes less of training time.
    • It requires less data (supervised).

    Training techniques: supervised and un-supervised
    • Supervised training techniques: uses labeled data (supervised text classification).
    • Un-supervised training techniques: no prior labeling (text clustering).

    Generative Models can be trained to answer questions. They can be fine-tuned to create models that respond to instructions.

    Training representation models uses a technique called masked language modeling. It masks tokens of the input and instructs the model to predict it.
© 2025  mtitek