This document will illustrate changes to data import to facilitate a reliable and performant system for ingesting MARC records etc into FOLIO.Code used to demonstrate instance create is listed below
Goals
Journey Simplification: The current flow involves back and forth communication between actors in the system with recurring expensive operations. Also means that the multiple actors makes troubleshooting harder, tracing the journey through multiple modules.
This will create avenue to allow complex functionality to be developed within a simple paradigm.Cross-Domain Flexibility: There are scenarios where actions in one domain e.g. orders is dependent on another domain e.g. inventory for perform a complete unit of work. Making that happen today is complicated via “post processing” and generally leaves room for only one dependent action to be performed i.e can only create a instance for the order, but inventory cannot interact with another domain cleanly within the purview of the original create order request.
Incoming Record Journey Logging: Show the journey of an incoming record in the journal.
Job Execution Status: Job execution can be broken at any actor but the status might not be communicated appropriately system wide. Provide structures that will allow good notification of job executions.
Error Codes: Introduce error codes so that user-friendly messages are shown to the user instead of coding exceptions.
Job Profile Validation: Increase validations performed on job profiles before persistence.
Multiple Results from Match Profile: Allow multiple results to be returned by a Match Profile in a Job Profile.
...
Design
Journey Simplification
...
The image above shows a system where a “Data Import Processor” is initialized in all the actors of the Data Import system with specific configuration that allow them to receive event from specific topics. For example, the Processor in mod-inventory would only listen to DI2_INVENTORY but not DI2_ORDERS. This approach will invert the dependency between data import and other FOLIO domains like orders & invoices. Rather than Data Import needed intimate knowledge of Inventory, Inventory is given an API to understand details about Data Import. All that is left on the Data Import side is to create a “runway” for the new FOLIO domain to be supported. Examples of a runway are DI2_INVENTORY, DI2_ORDERS; in the future maybe DI2_CIRCULATION.
Here is an example of the flow when creating instances with the revised Data ImportIMAGE
...
Here is another example of a flow when performing matching and sub-matching with the revised Data Import.IMAGE
...
IMAGE
Protocol
There is a new event that will be used in all communication in the revised flow; DataImportProcessorEvent. It is compatible with the event used in the current flow; Event/DataImportEventPayload.
...
ProfileSnapshotWrapper is an existing class in the current flow that is reused in the revised flow, hence it is not enumerated here.Specific enumerations for the “Data Import Processor” and other objects can be reviewed in the attached POC code.
Benefits
Better Performance: There will be less HTTP calls, Kafka messages and thrashing of SRS records.
Deploying the attached POC code to a local development machine; 10,000 instances are created in 5 minutes for the current flow and 3 minutes for the revised flow. The difference should be wider in a production like system. The current flow persists one CREATE & UPDATE of an instance together with a CREATE & UPDATE of a MARC record in SRS. The revised flow persists one CREATE of an instance with one CREATE of a MARC record in SRS. The revised flow does not produce intermediate Kafka messages used by the current flow.Easier Troubleshooting: Generally, interactions will be focused on certain modules rather than scattered across the system. This lessens the need to have more robust & costly observability practice.
Cleaner & Straightforward API For Data Import Implementers: The revised flow involves spinning up a Processor and assigning handlers for messages. A lot of the setup work executed by the Processor leaving Data Import Implementers to focus on their domain.
Foundation For More Flexibility of Job Profile: As mentioned earlier, this design lays the groundwork to allow other functionality to be easily implemented and less complex than if implemented in the current flow.
Enhanced Matching Functionality: Because a Processor will be responsible for activities dedicated to a FOLIO domain like inventory, matches and sub-matches are localized to the Processor. The current flow would involves communicating with SRS for MARC matching and Inventory for Instance matching. This localization will allow easier implementation for Match Profiles that allow multiple results and better performance since Kafka messages will not be sent between match and sub-match intents.
...
Tip |
---|
Solution |
...
A special header containing a request identifier will be added to the Kafka message as well as a property set in the processor context before sending the request to the appropriate processor kafka topic. When a reply is obtained vi DI2_REPLY, processors will scan the headers to see if the reply is meant for a waiting request at the processor instance. This will allow efficient processing of replies by preventing deserialization of the Kafka message if the reply is not pertinent to the data import processor instance.
...