Steps
- Gather existing issues (Vladimir Shalaev , Kateryna Senchenko )
- Create new features (Vladimir Shalaev, Kateryna Senchenko )
- Provide feature dependencies (Vladimir Shalaev , Kateryna Senchenko )
- Estimate (priorities + complexity) (Vladimir Shalaev , Kateryna Senchenko )
- Remove duplicates (grooming with Ann-Marie)
- Final priorities
- Align to timeline, and assign to appropriate Jira Feature, and review Jira issue priorities (Taisiya Trunova)
Categories
See : Assessment ratings
- Performance: di-performance
- Stability/Reliability: di-data-integrity (more tags to be added)
- Scalability
- Architecture
- Code quality
Priorities
High, Mid, Low
Complexity
S, M, L, XL, XXL
Table
Category | Problem definition | Business impact | Proposed solution | Priority DEV | Priority PO | Complexity | Existing Jira item(s) | Final feature(s) | |
---|---|---|---|---|---|---|---|---|---|
1 | Performance | Kafka producer closed after sending | Low performance of import | Create pool of active producers. Start pool on module launch, close on shutdown. Reuse connections. Add max/min pool sizes. | High | L | Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration. |
| |
2 | WARN message when no handler found | none | Do not subscribe to messages you're not going to process OR Lower log lever for this type of messages | Low | S |
|
| ||
3 | Stability/Reliability | Race condition on start (Kafka consumers start working before DB is configured) OR Periodical DB shutdown after SRS restart. Jobs get stuck if not able to update status in DB (messages ACKed even if we could not process them) | Imports might get stuck on module restart | Need investigation / check Investigate the issue with DB (possible OOM on PG server) | Mid |
|
| ||
4 | Performance Stability/Reliability | High CPU/Memory consumption on modules | Low performance of import. Higher costs for hosting | Significantly decrease size of payload:
| High | XXL |
|
| |
5 | Performance | Kafka cache resource consumption | Low performance of import. Higher costs of hosting. | Remove Kafka cache. Modules that do not do persistent changes will sometimes (on duplicates read) do unnecessary calls. Can be optimized further upon adding distributed in-memory cache (ex hazelcast) (blocked by 6) | Mid | M |
|
| |
6 | Stability/Reliability | Duplicates created upon import | Data inconsistency on import. | Make consumers behave idempotent. Add pass-through identifier to de-duplicate messages. | High | XL |
|
| |
7 | Stability/Reliability | Kafka consumers stop reading messages eventually, breaking job progress until module restart. | Imports eventually get stuck until module restart | Need investigation | High | ? |
|
| |
8 | Code quality | Test coverage is not high enough (Unit) | Higher amount of bugs | Write more tests | Mid | S |
|
| |
9 | Code quality | Test coverage is not high enough (Karate) | Higher amount of bugs | Write more tests (define test cases) | Mid | L |
|
| |
10 | Stability/Reliability | mod-data-import stores input file in memory, limiting size of uploaded file and possibly having oom | Data import file size is limited | Split to chunks, put to database, work with database/temp storage. Partially done (to be investigated) | Mid | L |
|
| |
11 | Performance | Data import impacts other processes | Slower response of system during data import | Need investigation (possible solution - configure rate limiter) Relates to number 4 | no ticket |
| |||
12 | Performance | High resource consumption to get job(s) status/progress | Slow performance of import and landing page. | Add some kind of caching for progress tracking (database or in-memory) | Low | S |
|
| |
13 | Stability/Reliability | SRS can fail when processing message during import | Import can end up creating some instances but not creating holdings/items for some MARC records | Generate "INSTANCE CREATED" from mod-inventory. Consume in SRS to update HRID in BIB and in INVENTORY to continue processing. Remove unnecessary topics (* ready for post processing and hrid set) | Mid | L |
|
| |
14 | Stability/Reliability | If we have infrastructure issue (like DB not available, module being restarted or network failure), we are sending DI_ERROR instead of retrying | Records that can potentially be processed during import are not processed if we have temporary infrastructure issues (DB down, network connectivity loss, etc) | Do not ACK messages in Kafka if there's not a logic, but infrastructure error/exception. Split failed processing results into 2 categories:
| Mid |
|
| ||
15 | Consumer gets disconnected from Kafka cluster | Jobs get stuck until module restart | Need investigation | Mid |
|
| |||
16 | De-duplication of status messages for progress bar | Progress bar might display incorrect progress | De-duplicate status messages per-record while tracking progress | Mid | L (depends on 12) |
|
|
Filters
Issues to potentially remove from scope