• Home
  • Docker
  • Kubernetes
  • LLMs
  • Java
  • Ubuntu
  • Maven
  • Big Data
  • Archived
LLMs | Topic Modeling
  1. Topic Modeling

  1. Topic Modeling
    Topic modeling is about finding themes (topics) within clusters of textual documents. The topics are labels (keywords) that capture the meaning of the cluster.


    BERTopic is a modular topic modeling technique that extract topic representations. BERTopic uses two steps to extract topics: text clustering and representation topics. The text clustering step provides BERTopic with the clusters of the documents that are semantically similar.

    See this page for more details about BERTopic:
    https://maartengr.github.io/BERTopic/index.html

    Example: Creating topics of a set of words

    Required modules:

    Python code:

    Output:

    Chart of the topics:
    Topics Plot
    Topics are represented by the main keywords, which are concatenated with the underscore character (“_”). A specific topic with the tag "-1" can be listed and should include all keywords that do not match a specific topic. This topic may also include outliers which are candidates that do not match any of the found topics.

    BERTopic can use a model to generate proper labels for the topics.

    For that we need to craft a prompt that should have two parts:
    • A subset of documents that best represent the topics that will inserted using the [DOCUMENTS] tag.
    • The keywords that make up the topics of the cluster that will be inserted using the [KEYWORDS] tag.


    Example: Creating labeled topics of a set of words

    Python code:

    Output:

© 2025  mtitek