• Home
  • Docker
  • Kubernetes
  • LLMs
  • Java
  • Ubuntu
  • Maven
  • Big Data
  • Archived
LLMs | Running Models
  1. Hugging Face Hub
  2. Run transformer models using Transformers
  3. Run transformer models using Transformers Pipelines
  4. Run transformer models using llama-cpp-python
  5. Key parameters of the transformer models
  6. Run a model with ChatGPT (OpenAI)
  7. Save the model and its associated tokenizer and configuration files
  8. Load the saved model and its associated tokenizer and configuration files

  1. Hugging Face Hub
    The Hugging Face Hub is an open platform with over 1 million models that can be used to process and generate text, images, audio, video, ...
    https://huggingface.co/models

    Selecting a model depends on:
    • The underlying architecture of the model (representation/generative model)
    • The size of the model
    • The performance of the model
    • The task to be executed by the model
    • The languages supported by the model
    • ...

    You can use the Hugging Face Embedding Leaderboard. It compares models across many languages.
    https://huggingface.co/spaces/mteb/leaderboard
  2. Run transformer models using Transformers
    You can use the Hugging Face CLI to download a model:
    Sample python coe to generate text:
  3. Run transformer models using Transformers Pipelines
    Sample python coe to use a pipeline:
  4. Run transformer models using llama-cpp-python
    Download this model:

    Python code:

  5. Key parameters of the transformer models
    There are few parameters that can affect the output of the model:

    • The model's context length:
      A model has a context length (a.k.a. the context window, context size, token limit):
      • The context length represents the maximum number of tokens the model can process.
      • Generative models are autoregressive, so the current context length will increase as new tokens are generated.

    • return_full_text:
      If set to "False", only the model output is returned.
      Otherwise, the full text is returned; including the user prompt.

    • max_new_tokens:
      It sets the maximum number of tokens the model can generate.

    • do_sample:
      The model decides the probability of all possible values ​​of the next token. It sorts the next possible tokens based on their probability of being chosen.

      If the "do_sample" parameter is set to "False", the model selects the most probable next token; this leads to a more predictable and consistent response. Otherwise, the model will sample from the probability distribution, leading to more possible tokens that can be chosen by the model.

      When we set the "do_sample" parameter to true, we can also use the "temperature" parameter to make the output more "random". Hence we can get different output for the same prompt.

    • temperature:
      It controls the probability that the model can choose less likely tokens.

      When we set the "temperature" parameter to 0 (deterministic), the model should always generate the same response when given the same prompt.

      The closer the value of the "temperature" parameter is to 1 (high randomness), the more likely we are to get a random output.
  6. Run a model with ChatGPT (OpenAI)
    ChatGPT (OpenAI) is a proprietary model. The model can be accessed through OpenAI's API.
    You need to sign-up and create an API key here: https://platform.openai.com/api-keys
    The API key will be used to communicate with OpenAI's API.

    To try out your new API key with curl:

    To try out your new API key with Python:

    Install the OpenAI Python SDK:

    Check OpenAI Python SDK installation:

    Execute the code below to generate a haiku for free using the gpt-4o-mini model.
  7. Save the model and its associated tokenizer and configuration files
    To save a model, tokenizer, and configuration files, we can use the "save_pretrained" method from the Hugging Face Transformers library.

    Ideally, you will save all related files in the same folder.

    Note that saving the model also saves its configuration file.

    • Save the model and its associated configuration files:

      This will create a directory containing:

    • Save the model tokenizer files:

      This will create a directory containing:

    • Save only the model configuration file.:

      This will create a directory containing:

    Files:
    • config.json: The configuration file of the model.

    • tokenizer_config.json: The configuration file of the tokenizer.

    • vocab.json, tokenizer.json: contain the vocabulary and the mapping of tokens to IDs.

    • special_tokens_map.json: contains the mapping of special tokens used by the tokenizer.

    • model.safetensors: contains the model's weights.

    • generation_config.json, merges.txt
  8. Load the saved model and its associated tokenizer and configuration files
    To load the saved model, tokenizer and configuration files, we can use the "from_pretrained" method from the Hugging Face Transformers library.

    Ideally, you will have saved all related files in the same folder.

© 2025  mtitek