Browse Instance classification numbers - Phase 1 POC

Status

IN PROGRESS

Impact

MEDIUM

Prod Ticket

UXPROD-4120 - Getting issue details... STATUS

Arch Ticket

ARCH-151 - Getting issue details... STATUS


Summary

The implementation of the instance-level classification search and browsing features is a part of FOLIO's future features. The feature is closely related to call number browse for instance items and similar or the same approach should be reused. However, current implementation is known to have particular limitations and one of the goals of this design is to overcome those limitations and prepare a solution that can also be applied to Item-level call number browse functionality. The context for the design is listed below:

  • Effective item call numbers are the only call numbers that can be browsed
  • Cannot search or browse instance/bibliographic classification
  • Some libraries shelflist by the bibliographic call number instead of the item-level call number

In Q Release the implementation should be treated as POC. The goals are:

  1. Validate the approach and reusability for item/holding-level call number browse and search functionality
  2. Assess the impact on data import and reindex. 

Requirements

Functional Requirements

Call number browse requirements overview - DRAFT

Non-functional Requirements

  1. Performance:
    1. Should limited impact the data import procedure.
    2. Should limited impact on the reindexing procedure
  2. Maintainability:
    1. Simplify the existing solution and allow it to be reused for future features like holding classification 
    2. Remove the limitation of 10 characters

Solution Options


OptionStatusProsCons
1Use the existing solution with item-level call-number browse and range-searchDeclined due to limitations of existing algorithm
  • The existing solution has known limitations
  • Negative impact on reindexing procedure
2Separate index instance_classifications with search_after approach similar to authors and contributorsDeclined due to the complexity of the approach in ECS environments
  • It does not require scripted upsert for indexing in OpenSearch/Elasticsearch in single tenant environment
  • Simplifies maintenance and modification of the feature
  • Possible negative impact on data import procedure
  • Possible negative impact on reindexing procedure
  • Requires preprocessing for classification extraction in mod-inventory-storage
  • For ECS environment requires resource-consuming updates for documents in the search index because of the collection information across all tenants
3Creation of instance_classifications  table in mod-search PostgreSQL db that will handle updates and search index instance_classifications with search_after approach Target solution
  • Simplifies maintenance and modification of the feature
  • Utilizes the same approach for standalone and ECS environments
  • No impact on mod-inventory-storage
  • No impact on data import procedure
  • Possible negative impact on reindexing procedure

Target Architecture

Approach:

The approach is designed for FOLIO standalone mode and ECS mode. The document in the classification search index can relate to multiple instance records in single or different tenants. This means that the document should be updated whenever the instance classification number was changed or removed. The complexity of updates in OpenSearch/ElasticSearch indexes negatively affects the performance of the reindexing procedure. To improve the performance of reindexing it is proposed to create a classification table in PostgreSQL database of mod-search where all update/insert/delete operations will happen. This will remove the complexity of partial update of the search index document.

  1. Create a new table in the database of mod-search :
    1. Table structure

      FieldType
      tenant_idcharacter varying
      classification_typeUUID
      classification_numbercharacter varying
      instance_idUUID
      sharedboolean
  2. Create a new index instance-classifications in OpenSearch
    1. Classification identifier: should be calculated as a concatenation of classification type and number
    2. Classification number
    3. ClassificationTypeId
    4. Effective shelving order
    5. List of shared flags per tenant
  3. mod-search should extract the classifications list and update the records in the classifications table in the database
  4. mod-search should query the database and insert the document in the search index.
  5. Browse:
    1. Browse functionality combines two search_after queries sorted by effective shelving order around the anchor record.
    2. Users should be able to navigate from browse results to related Instances in the search

Indexing Sequence Diagram:

 Diagram source
@startuml
'https://plantuml.com/sequence-diagram

autonumber

title indexing

participant "mod-inventory" as mi
participant "mod-inventory-storage" as inv
queue kafka
participant "mod-search" as ms
database "Postgres" as db
database "OpenSearch/ElasticSearch" as os

== 1. Extracting classification from instance record ==
mi -> inv ++: save instance
inv --> kafka: instance domain event
return saved instance

== 2. Indexing classification ==
kafka --> ms++: bulk consume\ninstance resource event

loop for each instance
ms -> ms: extract event type
ms -> db ++: insert/delete classifications \ndepending on event type
return ok
ms -> db ++: query classifications by classificationNumber
return list of classifications

end loop

ms -> os++: bulk request for indexing classifications
return acknowledgement
return

@enduml


Browsing Sequence Diagram:

 Diagram source
@startuml
'https://plantuml.com/sequence-diagram

autonumber

title Browse approach

actor User as u
participant "FOLIO UI" as ui
participant "mod-search" as ms
database "OpenSearch" as os


u -> ui ++: browse classification
ui -> ms ++: browse query

ms -> os ++: anchor query
return anchor record
ms -> os ++: search_after query sort by effectiveShelvingOrder ASC
return "after" records
ms -> os ++: search_after query sort by effectiveShelvingOrder DESC
return "before" records


return result list
return result list

@enduml

Questions


QuestionAnswer
1An instance can have multiple classifications with different types. Does that mean that an instance cannot have two or more classifications with the same type? 

Christine Schultz-Richert : technically, yes one instance could have classifications of the same type (there shouldn't be, but the MARC standard indicates that the classification fields are repeatable)

2Can two instances have the same classification type and number?

Christine Schultz-Richert : It is unlikely that two instances have the same classification, however it is definitely possible