Apache Solr - SchemaXml

Apache Solr | schema.xml

"schema.xml": Structure
Unique Key
Valid attributes for fields
Fields naming convention
Dynamic fields
Fields types
Tokenizers
Filters

"schema.xml": Structure

Visit the Solr Wiki page for more information: https://cwiki.apache.org/confluence/display/solr/SchemaXml

See these sample Schema files for more information (9.8.1):
► ${SOLR_HOME}/configsets/_default/conf/managed-schema.xml
► ${SOLR_HOME}/configsets/sample_techproducts_configs/conf/managed-schema.xml

<schema name="" version="1.6" />

<uniqueKey />

<field />

<dynamicField />

<copyField />

<fieldType />

<fieldType name="" class="">
    <analyzer type="index">
        <tokenizer class=""/ >
        <filter class=""/ >
    </analyzer>

    <analyzer type="query">
        <tokenizer class=""/ >
        <filter class=""/ >
    </analyzer>
</fieldType>

Unique Key
Field to use to determine and enforce document uniqueness.
The field will be required, unless it's marked with required="false".
```
<uniqueKey>id</uniqueKey>
```
Valid attributes for fields
```
<field ... />
```
- name: [mandatory] - the name of the field.
- type: [mandatory] - a name of a field type from the <fieldType> section.
- indexed: [default=true] - if this field should be indexed (searchable or sortable).
- stored: [default=true] - if this field should be retrievable.
- required: if this field is required.
  It will throw an error if the value does not exist when indexing a document.
- default: a value that should be used if no value is specified when adding a document.
- multiValued: [default=true] - if this field may contain multiple values per document.
- termPositions: stores position information with the term vector.
  This will increase storage costs.
- termOffsets: stores offset information with the term vector.
  This will increase storage costs.
- docValues: [default=true] - if this field should have doc values.
  Doc Values is recommended (required, if you are using *PointField fields) for faceting, grouping, sorting and function queries.
  Doc Values will make the index faster to load, more NRT-friendly and more memory-efficient.
  They are currently only supported by StrField, UUIDField, all *PointField fields, and depending on the field type, they might require the field to be single-valued, be required or have a default value
  (check the documentation of the field type you're interested in for more information).
- omitNorms: (expert) set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory).
  Only full-text fields or fields that need an index-time boost need norms.
  Norms are omitted for primitive (non-analyzed) types by default.
- termVectors: [default=false] set to true to store the term vector for a given field.
  When using MoreLikeThis, fields used for similarity should be stored for best performance.

Fields naming convention

Field names should consist of alphanumeric or underscore characters only and not start with a digit.
Names with both leading and trailing underscores (e.g. _version_) are reserved.

Special Names:

id

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

_version_

<field name="_version_" type="plong" indexed="false" stored="false" />

_root_

<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />

_text_

<field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true" />

Dynamic fields

Dynamic field definitions allow using convention over configuration for fields via the specification of patterns to match field names.
Example: <dynamicField name="*_i" /> will match any field ending in _i (like myid_i, z_i).
Restriction: the glob-like pattern in the name attribute must have a "*" only at the start or the end.

<dynamicField name="*_i" type="pint" indexed="true" stored="true" />
<dynamicField name="*_is" type="pints" indexed="true" stored="true" />

<dynamicField name="*_s" type="string" indexed="true" stored="true" />
<dynamicField name="*_ss" type="strings" indexed="true" stored="true" />

<dynamicField name="*_l" type="plong" indexed="true" stored="true" />
<dynamicField name="*_ls" type="plongs" indexed="true" stored="true" />

<dynamicField name="*_t" type="text_general" indexed="true" stored="true" multiValued="false" />
<dynamicField name="*_txt" type="text_general" indexed="true" stored="true" />

<dynamicField name="*_b" type="boolean" indexed="true" stored="true" />
<dynamicField name="*_bs" type="booleans" indexed="true" stored="true" />

<dynamicField name="*_f" type="pfloat" indexed="true" stored="true" />
<dynamicField name="*_fs" type="pfloats" indexed="true" stored="true" />

<dynamicField name="*_d" type="pdouble" indexed="true" stored="true" />
<dynamicField name="*_ds" type="pdoubles" indexed="true" stored="true" />

<dynamicField name="random_*" type="random" />

<dynamicField name="ignored_*" type="ignored" />

<dynamicField name="*_str" type="strings" indexed="false" stored="false" docValues="true" useDocValuesAsStored="false" />

<dynamicField name="*_dt" type="pdate" indexed="true" stored="true" />
<dynamicField name="*_dts" type="pdate" indexed="true" stored="true" multiValued="true" />

<dynamicField name="*_p" type="location" indexed="true" stored="true" />
<dynamicField name="*_srpt" type="location_rpt" indexed="true" stored="true" />

<!-- payloaded dynamic fields -->
<dynamicField name="*_dpf" type="delimited_payloads_float" indexed="true" stored="true" />
<dynamicField name="*_dpi" type="delimited_payloads_int" indexed="true" stored="true" />
<dynamicField name="*_dps" type="delimited_payloads_string" indexed="true" stored="true" />

<dynamicField name="attr_*" type="text_general" indexed="true" stored="true" multiValued="true" />

<dynamicField name="*_ws" type="text_ws" indexed="true" stored="true" />

<dynamicField name="*_t_sort" type="text_gen_sort" indexed="true" stored="true" multiValued="false" />
<dynamicField name="*_txt_sort" type="text_gen_sort" indexed="true" stored="true" />

<dynamicField name="*_txt_rev" type="text_general_rev" indexed="true" stored="true" />

<dynamicField name="*_phon_en" type="phonetic_en" indexed="true" stored="true" />

<dynamicField name="*_s_lower" type="lowercase" indexed="true" stored="true" />

<dynamicField name="*_descendent_path" type="descendent_path" indexed="true" stored="true" />
<dynamicField name="*_ancestor_path" type="ancestor_path" indexed="true" stored="true" />

<dynamicField name="*_point" type="point" indexed="true" stored="true" />

<dynamicField name="*_txt_en" type="text_en" indexed="true" stored="true" />
<dynamicField name="*_txt_en_split" type="text_en_splitting" indexed="true" stored="true" />
<dynamicField name="*_txt_en_split_tight" type="text_en_splitting_tight" indexed="true" stored="true" />

<dynamicField name="*_txt_ar" type="text_ar" indexed="true" stored="true" />
<dynamicField name="*_txt_bg" type="text_bg" indexed="true" stored="true" />
<dynamicField name="*_txt_ca" type="text_ca" indexed="true" stored="true" />
<dynamicField name="*_txt_cjk" type="text_cjk" indexed="true" stored="true" />
<dynamicField name="*_txt_cz" type="text_cz" indexed="true" stored="true" />
<dynamicField name="*_txt_da" type="text_da" indexed="true" stored="true" />
<dynamicField name="*_txt_de" type="text_de" indexed="true" stored="true" />
<dynamicField name="*_txt_el" type="text_el" indexed="true" stored="true" />
<dynamicField name="*_txt_es" type="text_es" indexed="true" stored="true" />
<dynamicField name="*_txt_eu" type="text_eu" indexed="true" stored="true" />
<dynamicField name="*_txt_fa" type="text_fa" indexed="true" stored="true" />
<dynamicField name="*_txt_fi" type="text_fi" indexed="true" stored="true" />
<dynamicField name="*_txt_fr" type="text_fr" indexed="true" stored="true" />
<dynamicField name="*_txt_ga" type="text_ga" indexed="true" stored="true" />
<dynamicField name="*_txt_gl" type="text_gl" indexed="true" stored="true" />
<dynamicField name="*_txt_hi" type="text_hi" indexed="true" stored="true" />
<dynamicField name="*_txt_hu" type="text_hu" indexed="true" stored="true" />
<dynamicField name="*_txt_hy" type="text_hy" indexed="true" stored="true" />
<dynamicField name="*_txt_id" type="text_id" indexed="true" stored="true" />
<dynamicField name="*_txt_it" type="text_it" indexed="true" stored="true" />
<dynamicField name="*_txt_ja" type="text_ja" indexed="true" stored="true" />
<dynamicField name="*_txt_ko" type="text_ko" indexed="true" stored="true" />
<dynamicField name="*_txt_lv" type="text_lv" indexed="true" stored="true" />
<dynamicField name="*_txt_nl" type="text_nl" indexed="true" stored="true" />
<dynamicField name="*_txt_no" type="text_no" indexed="true" stored="true" />
<dynamicField name="*_txt_pt" type="text_pt" indexed="true" stored="true" />
<dynamicField name="*_txt_ro" type="text_ro" indexed="true" stored="true" />
<dynamicField name="*_txt_ru" type="text_ru" indexed="true" stored="true" />
<dynamicField name="*_txt_sv" type="text_sv" indexed="true" stored="true" />
<dynamicField name="*_txt_th" type="text_th" indexed="true" stored="true" />
<dynamicField name="*_txt_tr" type="text_tr" indexed="true" stored="true" />

Fields types

<fieldType ... />

String field types:

string [class: solr.StrField]
strings [class: solr.StrField]

Boolean field types:

boolean [class: solr.BoolField]
booleans [class: solr.BoolField]

Numeric field types (precisionStep="8"):

pint [class: solr.IntPointField]
pints [class: solr.IntPointField]

plong [class: solr.LongPointField]
plongs [class: solr.LongPointField]

pfloat [class: solr.FloatPointField]
pfloats [class: solr.FloatPointField]

pdouble [class: solr.DoublePointField]
pdoubles [class: solr.DoublePointField]

Date field types (precisionStep="6"):

pdate [class: solr.DatePointField]
pdates [class: solr.DatePointField]

Binary field types:
```
binary [class: solr.BinaryField]
```
Random field types:
```
random [class: solr.RandomSortField]
```

Generic field types:

text_general [class: solr.TextField]
text_ws [class: solr.TextField]
text_general_rev [class: solr.TextField]

text_en [class: solr.TextField]
text_en_splitting [class: solr.TextField]
text_en_splitting_tight [class: solr.TextField]

text_[ar|bg|ca|cjk|cz|da|de|el|es|eu|fa|fi|fr|ga|gl|hi|hu|hy|id|it|ja|ko|lv|nl|no|pt|ro|ru|sv|th|tr] [class: solr.TextField]

text_gen_sort [class: solr.SortableTextField]

point [class: solr.PointType]

location [class: solr.LatLonPointSpatialField]
location_rpt [class: solr.SpatialRecursivePrefixTreeFieldType]

delimited_payloads_float [class: solr.TextField]
delimited_payloads_int [class: solr.TextField]
delimited_payloads_string [class: solr.TextField]

phonetic_en [class: solr.TextField]

lowercase [class: solr.TextField]

descendent_path [class: solr.TextField]
ancestor_path [class: solr.TextField]

Tokenizers

Visit the Solr wiki page for more information: https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#TokenizerFactories

solr.KeywordTokenizerFactory
solr.LetterTokenizerFactory
solr.WhitespaceTokenizerFactory
solr.LowerCaseTokenizerFactory
solr.StandardTokenizerFactory
solr.ClassicTokenizerFactory
solr.UAX29URLEmailTokenizerFactory
solr.PatternTokenizerFactory
solr.PathHierarchyTokenizerFactory
solr.ICUTokenizerFactory

Filters

Visit the Solr wiki page for more information: https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#TokenFilterFactories

solr.ClassicFilterFactory
solr.ApostropheFilterFactory
solr.LowerCaseFilterFactory
solr.TypeTokenFilterFactory
solr.TrimFilterFactory
solr.TruncateTokenFilterFactory
solr.PatternCaptureGroupFilterFactory
solr.PatternReplaceFilterFactory
solr.StopFilterFactory
solr.CommonGramsFilterFactory
solr.EdgeNGramFilterFactory
solr.KeepWordFilterFactory
solr.WordDelimiterFilterFactory
solr.SynonymFilterFactory
solr.RemoveDuplicatesTokenFilterFactory
solr.ISOLatin1AccentFilterFactory
solr.ASCIIFoldingFilterFactory
solr.PhoneticFilterFactory
solr.DoubleMetaphoneFilterFactor
solr.BeiderMorseFilterFactory
solr.ShingleFilterFactory
solr.PositionFilterFactory
solr.ReversedWildcardFilterFactory
solr.CollationKeyFilterFactory
solr.ICUCollationKeyFilterFactory
solr.ICUNormalizer2FilterFactory
solr.ICUFoldingFilterFactory
solr.ICUTransformFilterFactory