Table of Contents |
---|
Overview
- In this workflow, we are checking the performance of exporting MARC Bib records workflow (with items and holdings) running in the Kiwi release -
Jira Legacy server System JiraJIRA columnIds issuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-202
...
Backend:
- mod-data-export-4.2.1 and mod-data-export-4.2.2
- mod-source-record-storage-5.2.0
- mod-source-record-manager-3.2.2
- okapi-4.9.0
- mod-authtoken-2.9.0
...
- 7.2 million UChi SRS records
- 7.2 million inventory records (7.3 Million instances, 7.8 Million holdings record, 8.9 Million items)
- 77 FOLIO back-end modules deployed in 151 ECS services
- 3 okapi ECS services
- 6 m5.xlarge EC2 instances
- 1 writer db.r6g.xlarge 1 reader db.r6g.xlarge AWS RDS instance
- INFO logging level
High-Level Summary
- Data Export is relatively stable for 100K Instance records
- It is flaky as we increase the number of Instances to 500K. See the Jira ticket created
Test Runs
mod-data-export v4.2.1
With Items and Holdings
Test | Total instances | 1 User - Avg Total time to Export instances |
1. | 1000 | 1 minute |
2. | 100,000 | 1 hour 8 minutes |
3. | 500,000 | 5 hours 48 minutes |
mod-data-export v4.2.2
Test | Total instances | 1 User - Avg Total time to Export instances |
1. | 1000 | 1 minute |
2. | 100,000 | 51 minutes |
3. | 200,000 | 1 hour 48 minutes |
CQL
Test | Total instances | CQL query | 1 User - Avg Total time to Export instances |
1. | 22025 | (instanceTypeId=="7cb86491-cc57-4c77-a0a1-24ebfe925906" and source=="MARC") sortby title | 13 minutes |
2. | 1582 | (source=="MARC" and items.effectiveLocationId=="0a8c7b4e-04cd-42ac-a887-e7f2ee2ea6ec") sortby title | 1 minute |
CPU and Memory resources allocated
Module | CPU (units) | Hard/Soft Memory limit (MB) |
mod-data-export | 128 | 512/360 |
mod-inventory-storage | 128 | 864/536 |
mod-source-record-storage | 128 | 1440/896 |
CPU Utilization
mod-data-export v4.2.1
CPU utilization is fairly stable at 100% for 100, 100K, 500K records Data Exports with items and Holdings job profile.
mod-data-export v4.2.2
Performance improvement after fixing MDEXP-441
Service Memory Utilization
mod-data-export is stable at 120% even if we increase the number of instances gradually from 1000 to 500K records. For all other modules such as mod-source-record-storage, the okapi remains constant between 80% - 100%.
Issues faced during Data Export
I was able to successfully export 100K. But, when I tried to export 500K, Data Export failed in UI, and Chrome console log shows as
...
Code Block |
---|
{ "id" : "e74bc816-57cc-4c19-a075-ceda913a5adb", "hrId" : 7635, "exportedFiles" : [ { "fileId" : "dcbce2b8-7e84-4ca4-866f-07a2760a2d98", "fileName" : "kcp1-DE-500k-7635.mrc" } ], "jobProfileId" : "937d6256-8532-442b-9286-cbc3396fa18d", "jobProfileName" : "holdings and items", "progress" : { "exported" : 500000, "failed" : 0, "total" : 500000 }, "completedDate" : "2021-10-29T22:57:24.532+00:00", "lastUpdatedDate" : "2021-10-29T22:57:24.474+00:00", "startedDate" : "2021-10-29T16:39:46.971+00:00", "runBy" : { "firstName" : "folio", "lastName" : "folio" }, "status" : "COMPLETED" } |
JIRA ticket
Jira Legacy server System JiraJIRA columnIds issuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MDEXP-471 Jira Legacy server System JIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MDEXP-473 Jira Legacy server System JIRA columnIds issuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MDEXP-474 Jira Legacy server System JIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MDEXP-441
Data Export CSV files used to run test
All instance records in the below files are source=MARC
...