Skip to end of banner
Go to start of banner

Data Import Stabilization plan

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 28 Next »

Steps

  1. Gather existing issues (Vladimir Shalaev , Kateryna Senchenko )
  2. Create new features (Vladimir Shalaev, Kateryna Senchenko ) - UXPROD
  3. Provide feature dependencies (Vladimir Shalaev , Kateryna Senchenko )
  4. Estimate (priorities + complexity) (Vladimir Shalaev , Kateryna Senchenko )
  5. Remove duplicates (grooming with Ann-Marie)
  6. Final priorities
  7. Align to timeline, and assign to appropriate Jira Feature, and review Jira issue priorities (Taisiya Trunova)

Categories

See : Assessment ratings

  1. Performance: di-performance
  2. Stability/Reliability: di-data-integrity (more tags to be added)
  3. Scalability
  4. Architecture
  5. Code quality

Priorities

High, Mid, Low

Complexity

S, M, L, XL, XXL

Table


CategoryProblem definitionBusiness impactProposed solution

Priority

DEV

Priority

PO

ComplexityExisting Jira item(s)Final feature(s)
1PerformanceKafka producer closed after sendingLow performance of import

Create pool of active producers. Start pool on module launch, close on shutdown. Reuse connections.

Add max/min pool sizes.

High
L

UXPROD-3135 - Getting issue details... STATUS

2
WARN message when no handler foundnone

Do not subscribe to messages you're not going to process

OR

Lower log lever for this type of messages

Low
S


UXPROD-3135 - Getting issue details... STATUS

3Stability/Reliability

Race condition on start (Kafka consumers start working before DB is configured)

Imports might get stuck on module restartNeed investigation / check Low
M

UXPROD-3135 - Getting issue details... STATUS

4

Performance

Stability/Reliability

High CPU/Memory consumption on modulesLow performance of import. Higher costs for hosting

Significantly decrease size of payload:

  1. Remove immutable parts. Instead fetch them on demand and cache locally for reuse.
  2. Change message handling mechanism (currently relies on pt1 - profile) (optional)
  3. Move archiving to Kafka instead of module level
High
XXL

MODDATAIMP-439 - Getting issue details... STATUS

MODSOURMAN-519 - Getting issue details... STATUS


UXPROD-3135 - Getting issue details... STATUS

5PerformanceKafka cache resource consumptionLow performance of import. Higher costs of hosting.Remove Kafka cache. Modules that do not do persistent changes will sometimes (on duplicates read) do unnecessary calls. Can be optimized further upon adding distributed in-memory cache (ex hazelcast) (blocked by 6)Mid
M

MODINV-444 - Getting issue details... STATUS

MODINV-401 - Getting issue details... STATUS


UXPROD-3135 - Getting issue details... STATUS

6Stability/ReliabilityDuplicates created upon importData inconsistency on import.Make consumers behave idempotent. Add pass-through identifier to de-duplicate messages. High
XL

MODDATAIMP-474 - Getting issue details... STATUS

MODDATAIMP-440 - Getting issue details... STATUS

MODDATAIMP-491 - Getting issue details... STATUS


UXPROD-3135 - Getting issue details... STATUS

7Stability/ReliabilityKafka consumers stop reading messages eventually, breaking job progress until module restart.Imports eventually get stuck until module restartNeed investigationHigh
?

MODINV-417 - Getting issue details... STATUS

UXPROD-3135 - Getting issue details... STATUS

8Stability/ReliabilityTest coverage is not high enough (Unit)Higher amount of bugsWrite more testsMid
S

MODPUBSUB-168 - Getting issue details... STATUS


UXPROD-3135 - Getting issue details... STATUS

9Stability/ReliabilityTest coverage is not high enough (Karate)Higher amount of bugsWrite more tests (define test cases)Mid
L

UXPROD-2697 - Getting issue details... STATUS

UXPROD-2697 - Getting issue details... STATUS

10Stability/Reliabilitymod-data-import stores input file in memory, limiting size of uploaded file and possibly having oomData import file size is limitedSplit to chunks, put to database, work with database/temp storage. Partially done (to be investigated)Mid
L

MODDATAIMP-390 - Getting issue details... STATUS

MODDATAIMP-392 - Getting issue details... STATUS

MODDATAIMP-465 - Getting issue details... STATUS

UXPROD-3135 - Getting issue details... STATUS

11PerformanceData import impacts other processesSlower response of system during data importNeed investigation (possible solution - configure rate limiter)



UXPROD-3135 - Getting issue details... STATUS

12PerformanceHigh resource consumption to get job(s) status/progressSlow performance of import and landing page.Add some kind of caching for progress tracking (database or in-memory)Low
S

MODSOURMAN-469 - Getting issue details... STATUS


UXPROD-3135 - Getting issue details... STATUS

13Stability/ReliabilitySRS can fail when processing message during import
Import can end up creating some instances but not creating holdings/items for some MARC records

Generate "INSTANCE CREATED" from mod-inventory. Consume in SRS to update HRID in BIB and in INVENTORY to continue processing.


Remove unnecessary topics (* ready for post processing and hrid set)

Mid
L

UXPROD-3135 - Getting issue details... STATUS

Filters

key summary type created updated due assignee reporter priority status resolution
Loading...
Refresh


Links

Data Import Observations for Improvements

  • No labels