Falcon - Elastic Search - Effective usage [DRAFT]

Dictionary

  • Token (or term) - the single entry in the index, which is the result of tokenization and analyzing of the string
    • For text fields (e.g. title, or sentences), each word or stem of the word is token.
    • For keyword fields (e.g. barcode or other identifiers) it is whole value is token. Or 
  • Field - JSON field in instance (or holding or item), which is indexed as field in elastic document. Every field can have no value, single value or array of values.
  • Elastic document - json string with fields, which is stored in elastic. For each field in Elastic there is a mapping
  • Mapping - metadata  how to index the value of the field. Contains information about tokenizer, analyzer, etc.
Efficiency level
Rank

Search complexity

Description
1O(1) in memoryExtremely fast (e.g. typically <100 ms)
2O(1)Fast (e.g. typically <500 ms)
3pro rata prefix_lengthFast (e.g. typically < 1000 ms)
4xtables couningFast enough (e.g. typically < 1000 ms)
5O(n)Slow (up to minutes on big datasets)

Use cases that will not be effective in Elastic

FunctionEfficiency levelIndexed instance dataSearch input textDocumentation
Full text search for terms with stemming and stop-word filtering2

{ "title": "The Lord of the Rings", ...}

title = Lords of the Ring
Keyword search (aka exact match).2{"barcode" : "12345678", ... }barcode = 12345678

Full text Search for terms with over all text fields and exact match for all keyword fields.


2

{

"title": "The Lord of the Rings", 

"publicNote" : "silver covering",

"barcode" : "12345678"

... }

Lord of the Ring silver covers

or

Lord of the Ring 12345678

Note: Stemming and Analyzing of various languages is supported, but we need to do use it only for required predefined list of languages.
Range filter2{ "createdDate" : "12-12-2020", ...}createdDate > 10-12-2020
Autocomplete 1{ "The Lord of the Rings" : "12-12-2020", ...}

input: Lord of,

output: The Lord of the Rings

From
Facets4{"effectiveLocation" : "some_uuid"}Output facets on the right-hand side
Wildcard search with * on leftin most cases 3, can be 2 if index_prefixes is specified{"hrid" : "12345678", ... }hrid = 12345*

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html. The query could be slow, if small number of characters is specified for query and we need to calculate all matches, because we need accurate counts

Wildcard search with * on right3{"hrid" : "12345678", ... }hrid = *2345678https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html
Wildcard search with * on left and right or in the middle5, but there can be optimizations that make it 2 for certain cases{"hrid" : "12345678", ... }

hrid = *34567*

hrid = 1234*78

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html

or https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html