• Home
  • LLMs
  • Python
  • Docker
  • Kubernetes
  • Java
  • All
  • About
Apache Solr | schema.xml
  1. "schema.xml": Structure
  2. Unique Key
  3. Valid attributes for fields
  4. Fields naming convention
  5. Dynamic fields
  6. Fields types
  7. Tokenizers
  8. Filters

  1. "schema.xml": Structure
    Visit the Solr Wiki page for more information: https://cwiki.apache.org/confluence/display/solr/SchemaXml

    See these sample Schema files for more information (9.1.1):
    ► ${SOLR_HOME}/configsets/_default/conf/managed-schema.xml
    ► ${SOLR_HOME}/configsets/sample_techproducts_configs/conf/managed-schema.xml

  2. Unique Key
    Field to use to determine and enforce document uniqueness.
    The field will be required, unless it's marked with required="false".

  3. Valid attributes for fields

    • name: [mandatory] - the name of the field.

    • type: [mandatory] - a name of a field type from the <fieldType> section.

    • indexed: [default=true] - if this field should be indexed (searchable or sortable).

    • stored: [default=true] - if this field should be retrievable.

    • required: if this field is required.
      It will throw an error if the value does not exist when indexing a document.

    • default: a value that should be used if no value is specified when adding a document.

    • multiValued: [default=true] - if this field may contain multiple values per document.

    • termPositions: stores position information with the term vector.
      This will increase storage costs.

    • termOffsets: stores offset information with the term vector.
      This will increase storage costs.

    • docValues: [default=true] - if this field should have doc values.
      Doc Values is recommended (required, if you are using *PointField fields) for faceting, grouping, sorting and function queries.
      Doc Values will make the index faster to load, more NRT-friendly and more memory-efficient.
      They are currently only supported by StrField, UUIDField, all *PointField fields, and depending on the field type, they might require the field to be single-valued, be required or have a default value
      (check the documentation of the field type you're interested in for more information).

    • omitNorms: (expert) set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory).
      Only full-text fields or fields that need an index-time boost need norms.
      Norms are omitted for primitive (non-analyzed) types by default.

    • termVectors: [default=false] set to true to store the term vector for a given field.
      When using MoreLikeThis, fields used for similarity should be stored for best performance.
  4. Fields naming convention
    Field names should consist of alphanumeric or underscore characters only and not start with a digit.
    Names with both leading and trailing underscores (e.g. _version_) are reserved.

    Special Names:
    • id

    • _version_

    • _root_

    • _text_
  5. Dynamic fields
    Dynamic field definitions allow using convention over configuration for fields via the specification of patterns to match field names.
    Example: <dynamicField name="*_i" /> will match any field ending in _i (like myid_i, z_i).
    Restriction: the glob-like pattern in the name attribute must have a "*" only at the start or the end.























  6. Fields types

    • String field types:

    • Boolean field types:

    • Numeric field types (precisionStep="8"):




    • Date field types (precisionStep="6"):

    • Binary field types:

    • Random field types:

    • Generic field types:









  7. Tokenizers
    Visit the Solr wiki page for more information: https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#TokenizerFactories

  8. Filters
    Visit the Solr wiki page for more information: https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#TokenFilterFactories

© 2025  mtitek