SPIKE: [MODINREACH-80] Record Contribution: Analyze Re-Index job implementation usage in mod-inventory-storage
Overview of re-indexing feature
Inventory storage currently provides a special interface to pull all existing instances, more precise instance ids, from its database: /instance-storage/reindex
The interface has three methods:
- POST /instance-storage/reindex – to start getting instance records
- the pulling is handled by a special job which id is returned in response
- GET /instance-storage/reindex/{id} – to obtain process job details by its id
- DELETE /instance-storage/reindex/{id} – to cancel specified job by its id
It was initially designed to support full Instance index re-build in ElasticSearch, that is why the name is "reindex". At the moment the only known client of these endpoints is mod-search which initiates pulling of instance ids by making a call to POST /instance-storage/reindex endpoint and then consumes REINDEX events from inventory.instance Kafka topic. The detailed description of this processes can be found in The event processing on example of mod-search section of SPIKE: [MODINREACH-78]
Problem statement
Folio is integrating with external system, called Inn-Reach, which enables participating libraries to offer materials to other libraries in a consortial borrowing system. Patrons of participating libraries can directly request and borrow materials from other participating libraries through the union catalog. Libraries participating in an INN-Reach consortium first need to contribute records (items, instances) to the central union catalog. This process is called Initial record contribution
D2IR Record Contribution
Architectural vision of Record contribution flow to be implemented in Folio can be found on this page: D2IR Record Contribution flow
Similarly to ElasticSearch index re-building, Initial contribution involves all existing instance records. It was proposed to enumerate all instances and items existing in inventory via REINDEX functionality of mod-inventory-storage. But there are some limitations that don't allow to use re-indexing as is for Initial record contributions
Re-indexing limitations
- single topic (inventory.instance) is used for different types of events
- both regular changes (like CREATE/UPDATE/DELETE) to instance records and re-indexing events posted into the same topic. This leads to mixing of concerns which in turns causes some additional filtering to be implemented to separate processing of different types of events. The topic is also can be overloaded with millions of events that are not relevant to the consumers who's only interested in regular instance record change.
- current re-indexing interface is client oriented, meaning it serves only to the purpose of mod-search
- other module cannot call the same interface and initiate instance record re-iteration because it'll cause unwanted index re-building
- interface name (/reindex) and event type (REINDEX) is purpose specific
- simultaneous execution of several jobs is not possible because there is no way to distinguish events produced by different jobs
Proposed solution
The proposed solution consists of mandatory changes (phase 1) and optional changes (phase 2), the later are kind of nice to have but can be postponed to a later date.
The majority of changes should be done in mod-inventory-storage and includes the following:
- introduce new "Instance Iteration" API interface – (phase 1)
introduce new "Iteration" domain event with flexible domain type – (phase 1)
- rename underlying business service(s), utility class(s), data structure(s) from "Reindex**" to "Iteration**" – (phase 2)
mod-search is supposed to eventually use Iteration API interface instead of Reindex interface. But to minimize the impact on mod-search and to allow it to gradually migrate to the new interface it's proposed to keep the existing interface for now and make the changes in phase 2.
Changes in mod-inventory-storage
Introduce new "Instance Iteration" API interface
The interface will provide similar functionality as Reindex interface does at the moment, with minor changes to naming and payloads.
Interface URL:
/instance-storage/instances/iteration
Methods:
POST |
|
---|
- initiate iteration of instance records
Request schema | Response schema |
---|---|
|
GET | /instance-storage/instances/iteration/{jobId} |
---|
- get iteration job by its id
Response schema |
---|
DELETE |
|
---|
- cancel iteration job with specified id
Running multiple iterations
There is no restriction to run multiple iterations.
If a client of the interface can handle or need to be able to trigger simultaneous iterations then it is possible. And otherwise it's up to the client to forbidden such cases: by knowing job id of a running process client can restrict another one
Support new "Iteration" domain event
Iteration job will produce and publish to Kafka new events with the following content:
{ "type": <event type>, "tenant": <tenant>, "jobId": <UUID of job> }
- event type – unlike "Reindex" domain event, "Iteration" event will have flexible even type provided by client of the interface during iteration triggering (POST /instance-storage/instances/iteration method). If the event type has not been provided in POST method then "ITERATE" type will be used as a default value.
- jod id – UUID of the job which produced this event. This new attribute will allow to verify on the client side that an event belongs to the job client has started and interested in, unexpected/unwanted events can be filtered out.
Alternatively job id can be transferred in event header and looks like it is already according to this code snippet:
REINDEX_JOB_ID_HEADER = "reindex-job-id"
Need to double check
All events related to particular job should be published to the topic specified in POST /instance-storage/instances/iteration method payload. This is different from Reindex job which always publishes to inventory.instance topic.
Rename business service(s), utility class(s), data structure(s) from "Reindex**" to "Iteration**"
These changes are not required to support new Iteration interface and can be postponed
The list of affected classes:
- ReindexService
ReindexJobRepository
- ReindexJobRunner
- ReindexJob (generated from RAML)
Affected table:
reindex_job
Changes in mod-search
Migrate to "Instance iteration" API interface
These changes are not required to support new Iteration interface and can be postponed
Once Iteration interface is in place, mod-search can deprecate usage of Reindex interface and switch to Iteration interface. This will mostly affect the following classes:
- IndexService
- InstanceStorageClient
- KafkaMessageListener
List of related USs