DRAFT: Basic search index documentation
work in progress. not yet verified.
Note: Cells with a yellow background or text in red indicate area is under investigation.
Contents
Overview/notes
*documentation on basic search options
mod-search field types:
Full text capable fields (aka. multi-lang fields), searching within a field - analyzed and preprocessed fields;
Term fields (keywords, bool, date fields, etc.), exact matches - non-analyzed fields.
Relevancy
Relevance sorting uses the Okapi BM25 algorithm, which takes into account the following factors:
Term Frequency (TF): First, it looks at how often your search words appear in each instance. If an instance has your search words many times, it gets a higher score because it's more likely to be a good match
Inverse Document Frequency (IDF): Then, it checks how common or rare your search words are across all the instances in the library. If your words are rare, they get a higher score. If they are common, they get a lower score. This helps give importance to unique words.
Document Length (DL): BM25 also considers how long each instance is. If an instance is very long, it might dilute the importance of your search words, so it gets a lower score.
Parameter Tuning: BM25 has a few parameters that you can adjust to fine-tune your search. These parameters help you control how much importance you want to give to term frequency, inverse document frequency, and document length.
Calculation: Finally, BM25 combines all these factors using a mathematical formula to calculate a score for each instance. The instance with the highest score is considered the best match for your search.
Word-stemming & exact phrase
When the language of the record is English, word-stemming is applied, even in exact phrase searches. For example, a search for “buy” will retrieve matches on records that contain “buying”. See elasticsearch documentation: https://www.elastic.co/docs/manage-data/data-store/text-analysis/stemming#algorithmic-stemmers .
Related documentation on potential changes:
Supported operators
Search | Description | Supported operator | Example | Query search syntax | Notes |
---|---|---|---|---|---|
Exact phrase | Searches for text that contain all search terms in the order in which they are searched | Enclosing terms in wildcards (asterisks) | *Global africa* Finds:
Does not find:
| == | Note: word-stemming still applies when the language of the record is English |
Contains all | Searches for text that contain all search terms regardless of order
| N/A | Global africa Finds:
| full-text: all or = Term: all or == |
Note: word stemming applies for records with a language of English, but see the following examples: “Africa” will not find “African”, and “America” will not find “American”, but “America” will find “Americas”. |
Contains any | Searches for text that contain any of the search terms regardless of order
| N/A | Global africa Finds:
| any | Supported in query and advanced search. Note: word stemming applies for records with a language of English, but see the following examples: “Africa” will not find “African”, and “America” will not find “American”, but “America” will find “Americas”. |
Starts with | Searches for text that start with the characters before the wildcard | * | Scenario 1: Title> buddhi* Finds:
Does not find:
Scenario 2: Title> *buddhi* Finds
See note | * | In order to perform a query that looks for instances where the entire field starts with certain characters, add an asterisks to the end of the query In order to perform a query that looks for instances where the field contains a value that starts with certain characters, wrap the text in asterisks. For example, the search for “buddh*” will look for instances where the field value begins with “buddh”. This means that a search for “buddh*” will return fewer results than a search for “buddhism”. |
Masking - zero or more, leading
| A zero or more character wildcard search, beginning with a wildcard | * | *chemistry finds:
| * |
|
Masking - zero or more, trailing | A zero or more character wildcard search, ending with a wildcard | * | surg* finds:
| * | Performing a “Starts with” search |
Masking - zero or more, internal
| A zero or more character wildcard search, containing a wildcard within the term
| * | wom*n finds:
| * |
|
Inventory
Instance search options
Search option - UI | Search option - BE | Type | Fields included | Default operator | Notes | Updates | |
---|---|---|---|---|---|---|---|
1 | Keyword (title, contributor, identifier, HRID, UUID) | keyword | Full-text and term? |
| Performs a “contains all” on full-text fields and an “exact phrase” on identifier fields | Does not search all full-text terms, only those identified in title of basic search option |
|
2 | Contributor | contributors | Full-text | Contributors (regardless of contributor type of name type) | Contains all |
|
|
3 | Title (all) | title | Full-text |
| Contains all |
|
|
4 | Identifier (all) | identifiers.value | Term | All identifiers, regardless of type | Exact phrase |
|
|
5 | Classification, normalized | classifications.classificationNumber | Term | Classification number | Exact phrase |
|
|
6 | ISBN | isbn | Term | ISBN, Invalid ISBN | Contains all |
|
|
7 | ISSN | issn | Term | ISSN, Invalid ISSN, Linking ISSN | Exact phrase |
|
|
8 | LCCN, normalized | lccn | Term | LCCN, Canceled LCCN | Exact phrase |
|
|
9 | OCLC number, normalized | identifiers.typeId + identifiers.value | Term | OCLC, Cancelled OCLC | Contains all | This does not appear to be working? In Ramsons and Sunflower. Current RRT thread 5/9 |
|
10 | Instance notes (all) | note.note | Full-text | Notes of all note types and administrative notes | Contains all |
|
|
11 | Instance administrative notes | administrativeNotes | Full-text | Administrative notes | Contains all |
|
|
12 | Place of publication | publication.place | Full-text | Place of publication | Contains all |
|
|
13 | Subject | subjects | Full-text | Subjects | Exact phrase |
|
|
14 | Instance HRID | hrid | Term | Instance HRID | Exact phrase |
|
|
15 | Instance UUID | id | Term | Instance UUID | Exact phrase |
|
|
16 | Authority UUID | authorityId | Term | Authority ID | Exact phrase | Need to understand if there is a separated field for contributors vs subjects (mod-search readme seems to imply that there is) |
|
17 | All | all | full-text or term? |
| Contains all | To search in query: cql.all Not sure if this truly includes “all” fields; but this particularly query includes fields from instances, holdings, and items. To search just instance fields, query = cql.allInstances |
|
18 | Query search | N/A |
|
| N/A | CQL queries constructed from any indexed fields |
|
19 | Advanced search | N/A |
| All basic search options | Contains all (operators/modifiers can be changed in modal) | Populated with advanced search query containing human readable operator text |
|
Holdings search options
Search option - UI | Search option - BE | Type | Fields included | Default operator | Notes | Updates | |
---|---|---|---|---|---|---|---|
1 | Keyword (title, contributor, identifier, HRID, UUID) | keyword | full-text and term? |
| Contains all | Performs a “contains all” on full-text fields and an “exact phrase” on identifier fields |
|
2 | ISBN | isbn | Term | ISBN, Invalid ISBN | Contains all |
|
|
3 | ISSN | issn | Term | ISSN, Invalid ISSN, Linking ISSN | Exact phrase |
|
|
4 | Call number, not normalized | holdingsFullCallNumbers | Term | Does this only look for matches on Prefix + Call number + Suffix? | Exact phrase | Case sensitive? Leading, internal, and trailing spaces NOT removed |
|
5 | Call number, normalized | holdingsNormalizedCallNumbers | Term | Prefix + Call number + Suffix | Exact phrase | Leading, internal, and trailing spaces removed.
|
|
6 | Holdings notes (all) | holdings.notes.note | Full-text | Holdings of all note types and holdings Administrative notes | Contains all |
|
|
7 | Holdings administrative notes | holdings.administrativeNotes | Full-text | Holdings Administrative notes | Contains all |
|
|
8 | Holdings HRID | holdings.hrid | Term | Holdings HRID | Exact phrase |
|
|
9 | Holdings UUID | holdings.id | Term | Holdings UUID | Exact phrase |
|
|
10 | All | all | full-text or term? |
| Contains all | To search in query: cql.all Not sure if this truly includes “all” fields; but this particularly query includes fields from instances, holdings, and items. To search just holdings fields, query = cql.allHoldings |
|
11 | Query search | N/A |
|
|
| CQL queries constructed from any indexed fields Can combine record types? |
|
12 | Advanced search | N/A |
| All basic search options | Contains all (operators/modifiers can be changed in modal) | Populated with advanced search query containing human readable operator text
|
|
Item search options
Search option - UI | Search option - BE | Type | Fields included | Default operator | Notes | Updates | |
---|---|---|---|---|---|---|---|
1 | Keyword (title, contributor, identifier, HRID, UUID, Barcode) |
| Full-text and term? |
| Contains all | Performs a “contains all” on full-text fields and an “exact phrase” on identifier fields | Sunflower: Includes “Barcode” |
2 | Barcode | item.barcode | Term | Barcode | Exact phrase |
|
|
3 | ISBN | isbn | Term | ISBN, Invalid ISBN | Contains all |
|
|
4 | ISSN | issn | Term | ISSN, Invalid ISSN, Linking ISSN | Exact phrase |
|
|
5 | Effective call number (item), not normalized | itemFullCallNumbers | Term | Prefix + Call number + Suffix? | Exact phrase |
|
|
6 | Effective call number (item), normalized | itemNormalizedCallNumbers | Term | Prefix + Call number + Suffix | Exact phrase | Currently does NOT contain all of the elements that are marked as “Effective call number” on the Item detail view. |
|
7 | Item notes (all) | item.notes.note | Full-text | Notes of all note types and administrative notes | Contains all |
|
|
8 | Item administrative notes | item.administrativeNotes | Full-text | Administrative notes | Contains all |
|
|
9 | Circulation notes | item.circulationNotes.note | full-text | Circulation notes | Contains all |
|
|
10 | Item HRID | item.hrid | Term | Item HRID | Exact phrase |
|
|
11 | Item UUID | item.id | Term | Item UUID | Exact phrase |
|
|
12 | All | all | full-text or term? |
| Contains all | To search in query: cql.all Not sure if this truly includes “all” fields; but this particularly query includes fields from instances, holdings, and items. To search just items fields, query = cql.allItems |
|
13 | Query search | N/A |
|
|
| CQL queries constructed from any indexed fields Can combine record types? |
|
14 | Advanced search | N/A |
| All basic search options | Contains all (operators/modifiers can be changed in modal) | Populated with advanced search query containing human readable operator text
|
|
MARC authority
MARC authority search options
Search option - UI | Type | Fields included | Default operator | Notes | Updates | |
---|---|---|---|---|---|---|
1 | keyword | Full-text |
| Contains all |
|
|
2 | Identifier (all) | Term | All identifiers regardless of type, LCCN, natural ID | Exact phrase |
|
|
3 | LCCN | Term | LCCN | Contains all |
|
|
4 | Personal name | Full-text |
| Contains all |
|
|
5 | Corporate/Conference name | Full-text |
| Contains all |
|
|
6 | Geographic name | Full-text |
| Contains all |
|
|
7 | Name-title | Full-text |
| Contains all |
|
|
8 | Uniform title | Full-text |
| Contains all |
|
|
9 | Subject | Full-text |
| Contains all |
|
|
10 | Children’s subject heading | Full-text | For LCCNs that start with the prefix “sj”
| Contains all |
|
|
11 | Genre | Full-text |
| Contains all |
|
|
12 | Advanced search |
| All basic search options | Contains all (operators/modifiers can be changed in modal) | Populated with advanced search query containing human readable operator text
|
|