Skip to end of banner
Go to start of banner

Search result suggestions

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

The completion suggester provides auto-complete/search-as-you-type functionality. This is a navigational feature to guide users to relevant results as they are typing, improving search precision. It is not meant for spell correction or did-you-mean functionality like the term or phrase suggesters. 

MSEARCH-13

For SPIKE MSEARCH-13 suggestion endpoint has been implemented in branch:  feature/msearch-13. This controller allows to perform suggest request to Elasticsearch using 2 query parameters - query (suggestion prefix to analyze) and limit (default value is 5).

-XGET .../search/instances/suggestions?query=book&limit=5 

Required Elasticsearch index mappings for suggestion field

{
  "suggest": {
    "type": "completion", 
    "analyzer": "simple",
    "max_input_length": "50"    # terms longer than 50 characters will be truncated to reduce memory consumption
  }
}

Other fields can be copied to this field using copy_to functionality in resource metadata description:

{
  ...
  "title": {
    "searchTypes": "sort",
    "inventorySearchTypes": [ "title", "keyword" ],
    "index": "multilang",
    "showInResponse": true,
    "mappings": {
      "copy_to": [ "sort_title", "suggest" ]
    }
  },
  ...
}

Elasticsearch suggest query:

{
  "from": 0,
  "size": 0,
  "_source": "false",
  "suggest": {
    "completion": {
      "prefix": "book",           # suggestion query prefix
      "completion": {             # type of the suggestion
        "field": "suggest",       # field, that will be used as source of suggestions (required) 
        "size": 5,                # number of suggest terms to return
        "skip_duplicates": true   # removes duplicates from result
      }
    }
  }
}

Performance results of completion query:

  • Indexed 2,5 millions of instances
  • Elasticsearch requires 2500 mb of java heap to store completion data
  • Response time ~25-30ms

MSEARCH-119

SPIKE MSEARCH-119 assumes that is there is a way to return suggest results using wildcard or prefix query.

Elasticsearch field mapping

 "suggest": {
   "type": "keyword",
   "normalizer": "keyword_lowercase",
   "store": true
}

Elasticsearch query

{
  "from": 0,
  "size": 0,
  "query": {
    "prefix": {
      "keyword_suggest": {
        "value": "wit"
      }
    }
  },
  "_source": false,
  "stored_fields": [ "keyword_suggest" ]
}

It will return response like

Search response
{
  "took": 89,
  "timed_out": false,
  "_shards": {
    "total": 4,
    "successful": 4,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1127,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "instance_diku",
        "_type": "_doc",
        "_id": "4c9664c8-565b-4245-984f-3dfa769abe8d",
        "_score": 1.0,
        "_routing": "diku",
        "fields": {
          "keyword_suggest": [
            "politieke machtsstrijd in en om de voornaamste belgische steden.1830-1848.",
            "politieke machtsstrijd in en om de voornaamste belgische steden.1830-1848.",
            "pro civitate. historische uitgaven. reeks in -8⁰, nr. 37",
            "collection histoire pro civitate.série in-8ono 37.",
            "witte, els",
            "belgium--politics and government--1830-1914",
            "politics and government",
            "belgium.",
            "1830-1914"
          ]
        }
      },
      {
        "_index": "instance_diku",
        "_type": "_doc",
        "_id": "40cc07e3-a4b4-4386-8c4b-af2d424764db",
        "_score": 1.0,
        "_routing": "diku",
        "fields": {
          "keyword_suggest": [
            "organisation für innovationsentscheidungen;das promotoren-modell.",
            "organisation für innovationsentscheidungen;das promotoren-modell.",
            "schriften der kommission für wirtschaftlichen und sozialen wandel, bd. 2",
            "kommission für wirtschaftlichen und sozialen wandel.schriften,bd. 2.",
            "witte, eberhard",
            "technological innovations",
            "decision-making",
            "decision making.",
            "technological innovations."
          ]
        }
      },
      {
        "_index": "instance_diku",
        "_type": "_doc",
        "_id": "90e29404-2178-4503-8202-1e56abf92a4d",
        "_score": 1.0,
        "_routing": "diku",
        "fields": {
          "keyword_suggest": [
            "brecht, as they knew him.hubert witt, editor. john peet, translator.",
            "erinnerungen an brecht.english.peet",
            "brecht, as they knew him.hubert witt, editor. john peet, translator.",
            "new world paperbacks",
            "witt, hubert",
            "brecht, bertolt,--1898-1956"
          ]
        }
      },
      {
        "_index": "instance_diku",
        "_type": "_doc",
        "_id": "b755f723-f279-43eb-8e53-938930efcc99",
        "_score": 1.0,
        "_routing": "diku",
        "fields": {
          "keyword_suggest": [
            "political maxims of the state of hollandcomprehending a general view of the civil government of that republic, and the principles on which it is founded : the nature, rise, and progress of the commerce of its subjects, and of their true interests with respect to all their neighbours /by john de witt ; translated from the dutch original, which contains many curious passages not to be found in any of the french versions ; to which is prefixed, historical memoirs of the two illustrious brothers cornelius and john de witt.",
            "aanwysing der heilsame politike gronden en maximen van de republike van holland en west-vriesland.english",
            "political maxims of the state of hollandcomprehending a general view of the civil government of that republic, and the principles on which it is founded : the nature, rise, and progress of the commerce of its subjects, and of their true interests with respect to all their neighbours /by john de witt ; translated from the dutch original, which contains many curious passages not to be found in any of the french versions ; to which is prefixed, historical memoirs of the two illustrious brothers cornelius and john de witt.",
            "goldsmiths'-kress library of economic literature ;no. 8031.2.",
            "court, pieter de la, approximately 1618-1685.",
            "witt, johan de, 1625-1672.",
            "witt, cornelis de, 1623-1672.",
            "witt, johan de,--1625-1672.",
            "witt, cornelis de,--1623-1672.",
            "political science--netherlands.",
            "netherlands--commercial policy.",
            "industries--netherlands.",
            "netherlands--foreign relations--1648-1714.",
            "commercial policy.",
            "diplomatic relations.",
            "industries.",
            "political science.",
            "netherlands.",
            "1648-1714"
          ]
        }
      },
      {
        "_index": "instance_diku",
        "_type": "_doc",
        "_id": "0fa3fb53-e913-4694-bcef-9e4a820e55d6",
        "_score": 1.0,
        "_routing": "diku",
        "fields": {
          "keyword_suggest": [
            "the psychology of the salem witchcraft excitement of 1692and its practical application to our own time.",
            "psychology of the salem witchcraft excitement of 1692and its practical application to our own time.",
            "library of american civilization ;lac 14279.",
            "beard, george m, (george miller), 1839-1883.",
            "guiteau, charles j,--(charles julius),--1841-1882.",
            "witchcraft--massachusetts--salem.",
            "forensic psychology.",
            "guiteau, charles j.--(charles julius),--1841-1882.",
            "witchcraft.",
            "massachusetts--salem."
          ]
        }
      }
    ]
  }
}

Values from field → keyword_suggest using java code to retrieve relevant Suggest Term using startWith() method.

Disadvantages of this approach:

  • It results N random documents from Elasticsearch index without relevancy (score=1 for all search hits)
  • Using copy_to functionality all values returned in the lowercase way

Performance results:

  • Indexed 2,5 million of instances
  • Response time ~20-50ms
  • Reindexing process is slightly faster and it does not require a lot of Java Heap


  • No labels