accompanies MODDATAIMP-412
P2 priority are red; P3 are green
Motivation
At the moment, the data import functionality has been transferred to a new solution that uses KAFKA as a direct transport. This made it possible to solve the main problems associated with the limitations of the HTTP protocol for data transfer and to implement a message delivery queue. A large amount of data in the payload has significantly increased the usage of KAFKA's disk storage. Also, the current implementation of calculating progress uses a single mechanism displaying progress on the UI, which leads to a slowdown in data import.
Goal
To improve the stability and speed of data import, several main goals need to be achieved. Reduce the size of the payload in messages. Separate the mechanism for calculating progress and displaying it on the UI. Add KAFKA error handling and write this to the log and include it in progress counting.
Main steps
Reduce the size of the payload
As part of this improvement, removing all objects from the dataImportEventPayload context that does not change during data processing. MappingRules, MappingParams, JobProfileSnapshot objects should not be a part of the KAFKA message. In turn, each module will receive data via the HTTP protocol from SRM and cache it locally under the jobExecutionId key.
To implement this solution, we need to implement an API and storage in SRM that will allow you to receive this data. We should load, and take needed values from the cache in other modules involved in data import. Also, we need to move payload zipping functionality to the KAFKA side, to reduce CPU usage by modules
Story | Estimation |
---|---|
MODSOURMAN-463 Create storage and API for MappingRules and MappingParams | 3 |
MODSOURMAN-464 Store snapshots of MappingRules and MappingParams to the database | 2 |
MODSOURMAN-465 Remove MappingRules, MappingParams, and JobProfileSnapshot from the event payload | 1 |
MODSOURMAN-466 Remove zipping mechanism for data import event payloads | 2 |
MODSOURCE-286 Remove zipping mechanism for data import event payloads and use cache for params | 5 |
MODINV-405 Remove zipping mechanism for data import event payloads and use cache for params | 5 |
MODINVOICE-251 Remove zipping mechanism for data import event payloads and use cache for params | 2 |
Separate the mechanism for calculating progress and displaying it on the UI
To ensure more stable application's work under load and with a large number of users, it is necessary to revise the mechanism for calculating the progress of work. The best solution for this would be to add a new API to support the landing page UI and send lightweight DTOs, rather than the full job execution objects that are involved in the data import process. New objects will be stored separately from job execution and updated in the background. Adding indexes will help to reduce size of logs, plus help with log sorting and retrieval. This will also help when multiple users open the landing page.
Story | Estimation |
---|---|
MODSOURMAN-468 Create a new API and database table that should store and represent information for the Data-Import landing page. | 5 |
MODSOURMAN-469 Change data-import progress mechanism with a new plain DB table counter and background job | 8 |
UIDATIMP-918 Use new API for DataImport landing page | 3 |
KAFKA error handling
To ensure a more stable operation of the data import function when using KAFKA, it is necessary to work out in more detail the handling of errors and the application's response to them. To do this, we need to make changes to the general library Folio-Kafka-wrapper and the modules that use it. All errors should be logged and included in the data-import journal. It is also necessary to increase the test coverage of this library and cover edge cases. This should help to keep jobs from getting stuck; they should either complete, complete with errors, or fail.
Story | Draft estimation |
---|---|
MODPUBSUB-167 Reconsider error handling in KafkaConsumerWrapper | 8 |
MODPUBSUB-168 Cover with tests folio-kafka-wrapper | 5 |
MODSOURMAN-474 Implement ProcessRecordErrorHandler for Kafka Consumers | 5 |
MODSOURCE-290 Implement ProcessRecordErrorHandler for Kafka Consumers | 3 |
MODINVOICE-252 Implement ProcessRecordErrorHandler for Kafka Consumers | 2 |
MODINV-408 Implement ProcessRecordErrorHandler for Kafka Consumers | 5 |
Kateryna Senchenko Vladimir Shalaev Ann-Marie Breaux (Deactivated) please review this document