MODAUD-155 - Performance Improvement in Searching Circulation Log by Barcode
Context
Tracking
Type | Ticket | Module |
---|---|---|
Module ticket | mod-audit | |
Arch ticket |
Technical details
The circulation log data is persisted in jsonb
format in FOLIO database and itemBarcode
field is part of the nested array items
as each circulation log record can contain multiple items. No indexes are present for the table.
Problem Statement
Circulation log search by item barcode slow when table contains millions of rows.
Architecturally Significant Requirements
# | Requirement | Description | Validation Approach | Remarks |
---|---|---|---|---|
1 | Performance | Search on table with 2M records should be done under 30 seconds |
| TBC |
Stakeholders
- Tim Auger - Product Owner
- Steve Ellis - Tech Lead
- Gurleen Kaur1 - Dev Lead
- Irina Pokhylets - Business Analyst
Solution Options
# | Title | Description | Pros | Cons |
---|---|---|---|---|
1 | Define retention policy (use archiving or table partitioning) |
|
|
|
2 | Add gin index for nested items → itemBarcode field |
|
| |
3 | Limit search date range |
|
|
|
Questions
# | Question | Answer | Status |
---|---|---|---|
1 | How many records per year clients produce? | (OLD ACCOUNT) Erin Nettifee : 2M+ records per year | ANSWERED |
2 | Do clients have any retention policy or regulations on circulation records? | (OLD ACCOUNT) Erin Nettifee : No common retention approach for different libraries | ANSWERED |
Rationale
Option 1 depends on client needs that might be hard to define for the whole community.
Option 2 should be implemented in any case but requires additional performance testing for writing into circulation log.
Option 3 requires minimal effort but has tradeoffs. Can be implemented as quick fix.
Proposed solution is option 2 due to minimal changes on backend and no changes at frontend
Update on option 2 (Apr 10th, 2023):
SELECT id, jsonb FROM %schema%_mod_audit.circulation_logs where ((circulation_logs.jsonb->>'items') @@ ('BATTLE10')) LIMIT 100 OFFSET 0;
select id, jsonb from %schema%_mod_audit.circulation_logs where %schema%_mod_audit.get_tsvector(%schema%_mod_audit.f_unaccent(jsonb ->> 'items'::text))@@ plainto_tsquery('%BATTLE10%') LIMIT 100 OFFSET 0;
|
Update on option 2 (Apr 18th, 2023):
|
Estimation
# | Title | Estimation |
---|---|---|
1 | Define retention policy (use archiving or table partitioning) |
|
2 | Add gin index for nested items → itemBarcode field |
|
3 | Limit search date range |
|
Decision
Proposed solution is option 2 - <TBC>