Batch Importer (Bib/Acq) (UXPROD-47)

[UXPROD-3193] NFR: R3 2021 Kiwi Data import Stability/Reliability work Created: 22/Jul/21  Updated: 04/Jan/22  Resolved: 05/Nov/21

Status: Closed
Project: UX Product
Components: None
Affects versions: None
Fix versions: Kiwi (R3 2021)
Parent: Batch Importer (Bib/Acq)

Type: New Feature Priority: P2
Reporter: Taisiya Trunova Assignee: Ann-Marie Breaux (Inactive)
Resolution: Done Votes: 0
Labels: data-import, epam-folijet, split
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Continues
is continued by UXPROD-3210 NFR: R1 2022 Lotus Data import Stabil... Closed
Defines
defines UXPROD-3135 NFR: R3 2021 Kiwi Data Import Stabili... Closed
defines UXPROD-47 Batch Importer (Bib/Acq) Analysis Complete
is defined by MODDATAIMP-440 SPIKE: Data Import job is creating du... Closed
is defined by MODDATAIMP-474 SPIKE: Review PTF job that created mo... Closed
is defined by MODDATAIMP-558 Hosted envs performance became slow a... Closed
is defined by KAFKAWRAP-10 Provide property to set compression t... Closed
is defined by MODDATAIMP-390 Spike: Memory not released after import Closed
is defined by MODDATAIMP-465 Fix memory leaks after import. Closed
is defined by MODDATAIMP-544 Test and merge PRs on reducing DI eve... Closed
is defined by MODDATAIMP-548 Provide system properties to set chun... Closed
is defined by MODDICORE-198 Fix the effect of DI_ERROR messages w... Closed
is defined by MODINV-417 SPIKE: LeaveGroup request failed with... Closed
is defined by MODINV-493 Update marc-holdings related event ha... Closed
is defined by MODINV-494 Ensure mapping parameters for holding... Closed
is defined by MODINV-553 Fix the effect of DI_ERROR messages w... Closed
is defined by MODINVOICE-314 Provide recordId header for events co... Closed
is defined by MODINVSTOR-794 Memory Leaks: io.vertx.core.impl.Dupl... Closed
is defined by MODSOURCE-339 SPIKE: Crash Postgres DB on Rancher d... Closed
is defined by MODSOURCE-390 Fix the effect of DI_ERROR messages w... Closed
is defined by MODSOURMAN-522 Fix the effect of DI_ERROR messages w... Closed
is defined by MODSOURMAN-579 Data Import failing to decode record ... Closed
is defined by KAFKAWRAP-11 Remove duplicate kafka headers Closed
is defined by MODDATAIMP-430 Data Import logs shows file delete ER... Closed
is defined by MODDICONV-200 GET data-import-profiles/actionProfil... Closed
is defined by MODINV-405 Remove zipping mechanism for data imp... Closed
is defined by MODINVOICE-251 Remove zipping mechanism for data imp... Closed
is defined by MODSOURCE-286 Remove zipping mechanism for data imp... Closed
is defined by MODSOURMAN-463 Create storage and API for MappingRul... Closed
is defined by MODSOURMAN-464 Store snapshots of MappingRules and M... Closed
is defined by MODSOURMAN-465 Remove MappingRules, MappingParams, a... Closed
is defined by MODSOURMAN-466 Remove zipping mechanism for data imp... Closed
is defined by MODSOURMAN-481 Clean-up backend log and gracefully h... Closed
is defined by MODSOURMAN-521 Remove duplicate kafka headers Closed
is defined by MODSOURMAN-575 Add mechanism for detection and loggi... Closed
Epic Link: Batch Importer (Bib/Acq)
Front-End Confidence factor: Low
Back End Estimate: Jumbo: > 45 days
Development Team: Folijet
PO Rank: 117
Rank: Cornell (Full Sum 2021): R1
Rank: U of AL (MVP Oct 2020): R1

 Description   

Team estimation - 90 days

UXPROD-3135 Closed was split into UXPROD-3193 Closed for stability and reliability and UXPROD-3191 Closed for performance; Ann-Marie Breaux to close UXPROD-3135 Closed once all issues moved from it to the new features

Current situation or problem:
1.High CPU/Memory consumption on modules

2.Duplicates created upon import
3. SRS can fail when processing message during import

4. If we have infrastructure issue (like DB not available, module being restarted or network failure), we are sending DI_ERROR instead of retrying
5. De-duplication of status messages for progress bar

Investigation required for:

6. Race condition on start (Kafka consumers start working before DB is configured) OR Periodical DB shutdown after SRS restart. Jobs get stuck if not able to update status in DB (messages ACKed even if we could not process them)
7.Kafka consumers stop reading messages eventually, breaking job progress until module restart.
8.mod-data-import stores input file in memory, limiting size of uploaded file and possibly having oom
9.Consumer gets disconnected from Kafka cluster

In scope

Out of scope

Use case(s)

Proposed solution/stories
1*.*Significantly decrease size of payload:

  1. Remove immutable parts. Instead fetch them on demand and cache locally for reuse.
  2. Change message handling mechanism (currently relies on pt1 - profile) (optional)
  3. Move archiving to Kafka instead of module level

2.Make consumers behave idempotent. Add pass-through identifier to de-duplicate messages. 
3.Generate "INSTANCE CREATED" from mod-inventory. Consume in SRS to update HRID in BIB and in INVENTORY to continue processing.

4.Do not ACK messages in Kafka if there's not a logic, but infrastructure error/exception. Split failed processing results into 2 categories:

  1. IO errors - do not ack. retry until fixed
  2. Business logic - DI_ERROR and Ack current message

Remove unnecessary topics (* ready for post processing and hrid set)

5.De-duplicate status messages per-record while tracking progress

Problems 6,7,8 and 9 require investigation
Possible solution for problem 8 -  Split to chunks, put to database, work with database/temp storage. Partially done (to be investigated)

Links to additional info:
Data Import Stabilization plan - Vladimir Shalaev - FOLIO Wiki

Questions


Generated at Fri Feb 09 00:30:05 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.