Page Properties |
---|
Submitted | |
---|
Approved |
|
---|
Status | |
---|
Impact |
---|
MEDIUM
|
Context
Tracking
Type | Ticket | Module |
---|
Module ticket | |
JiraJIRA | serverId | 01505d01-b853-3c2e-90f1-ee9b165564fc |
---|
key | MODAUD-155 |
---|
|
| mod-audit |
Arch ticket | |
JiraJIRA | serverId | 01505d01-b853-3c2e-90f1-ee9b165564fc |
---|
key | ARCH-51 |
---|
|
|
|
Technical details
The circulation log data is persisted in jsonb
format in FOLIO database and itemBarcode
field is part of the nested array items
as each circulation log record can contain multiple items. No indexes are present for the table.
Circulation log search by item barcode slow when table contains millions of rows.
Architecturally Significant Requirements
# | Requirement | Description | Validation Approach | Remarks |
---|
1 | Performance | Search on table with 2M records should be done under 30 seconds | - Prepopulate table with X records
- Do search by itemBarcode
- Validate timing
| TBC |
Stakeholders
- Tim Auger - Product Owner
- Steve Ellis - Tech Lead
- Gurleen Kaur1 - Dev Lead
- Irina Pokhylets - Business Analyst
Solution Options
...
# | Title | Description | Pros | Cons |
---|
1 | Define retention policy (use |
...
archiving or table partitioning) |
...
- Archive the old records (maybe not delete them) in another table or use PostgreSQL partitioning - like 1+ year old record.
- Create and add an index for createTimeStamp column. This will restrict the search to most recent actions that are taken on the item and improves the search performance.
| - Future-proof solution. Will have same response time in case of the same amount of records per time period
| - In case of unpredicted growth of transactional volume search can be slow
- Partitioning is not used in other modules, hence breaks the unity or common approach
- Search in longer date period may still be slow
- Not quite clear what is the common period for retention
|
2 | Add gin index for nested items → itemBarcode field |
...
- Create gin index for itemBarcode field (see link)
| - Improves performance
- Requires minimal backend changes
| - May negatively affect performance of writing due to calculation of index.
- Has limitations
|
3 | Limit search date range | - Make UI
from / to fields in search view mandatory - Force maximum date range to one year
- If further search is required add controls to go to next or previous period
| - No backend changes required
- Can serve as a workaround
| - Limits user experience
|
Questions
# | Question | Answer | Status |
---|
1 | How many records per year clients produce? | | Status |
---|
| |
---|
colour | RedGreen |
---|
title | OpenANswered |
---|
|
|
2 | Do clients have any retention policy or regulations on circulation records? | (OLD ACCOUNT) Erin Nettifee : No common retention approach for different libraries | Status |
---|
| |
---|
colour | RedGreen |
---|
title | OpenAnswered |
---|
|
|
Rationale
Option A 1 depends on client needs that might be hard to define for the whole community.
Option B 2 should be implemented in any case but requires additional performance testing for writing into circulation log.
Option C 3 requires minimal effort but have has tradeoffs. Can be implemented as quick fix.
Proposed solution is option 2 due to minimal changes on backend and no changes at frontend
Update on option 2 (Apr 10th, 2023): - Gin indexes are already present in the table creation script
- Queries that are generated by
PostgresClient do not hit the indexes. The library auto-generated SQL Query example is:
Code Block |
---|
| SELECT id, jsonb
FROM %schema%_mod_audit.circulation_logs
where ((circulation_logs.jsonb->>'items') @@ ('BATTLE10')) LIMIT 100 OFFSET 0; |
- To enable the index in the search the query should be rewritten in this form:
Code Block |
---|
| select id, jsonb
from %schema%_mod_audit.circulation_logs
where %schema%_mod_audit.get_tsvector(%schema%_mod_audit.f_unaccent(jsonb ->> 'items'::text))@@ plainto_tsquery('%BATTLE10%') LIMIT 100 OFFSET 0; |
- Below are screenshots of query execution plan for both queries:
- Image AddedImage Added
|
Update on option 2 (Apr 18th, 2023): - The SQL generation by PostgresClient already contains options for invoking full-text search with
gin-indexes . The description of mechanism is located in https://dev.folio.org/faqs/explain-cql/. Every approach has its tradeoffs and possible solution may be to leave the choice for users. The available options are as following:- Exact search: requires input of full barcode (e.g. type "12345" to find "12345" barcode). Fastest
- Left-anchor search: requires input starting symbols of barcode (e.g. type "123" to find "12345" barcode).
- "Contains" search: requires input of any substring in barcode (e.g. type "234" to find "12345" barcode).
|
...
LOE
...
Estimation
# | Title | Estimation |
---|
1 | Define retention policy (use archiving or table partitioning) | - POC - 1 sprint
- Implementation -2 sprints
- Performance testing - 1 sprint
|
2 | Add gin index for nested items → itemBarcode field | - Implementation - 1 sprint
- Performance testing - 1 spring
|
3 | Limit search date range | - Implementation - 1 sprint
- Performance testing - 1 spring
|
Decision
Proposed solution is option 2 - <TBC>