Table of Contents | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
This document contains the results of testing Data Import for MARC Bibliographic records at Quesnelia release [non-ECS]. https://folio-org.atlassian.net/jira/people/712020:c7153665-e98d-4df6-a9f4-fe368ae2480f/boards/224?selectedIssue=PERF-836
Summary
Recommendations and Jiras
Ticket:
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Summary
All Data-imports jobs finished successfully.
The date import duration for PTF - Create 2 profile slightly increased by 5% on average. The PTF - Updates Success - 1 profile was created for the Quesnelai release and has differences with the previous PTF - Updates Success - 6, so the results are not comparable with the Poopy release.
DI duration growth correlates to the number of records imported.
The average CPU utilization of modules for all Create and Update jobs did not exceed 150 %. Spikes at the beginning on the mod-data-import module are expected because of large file uploading.
No memory leak is suspected for DI modules. During DI of 500K on both profiles, mod-pubsub has a sawtooth-like memory usage within the range of 40-60%.
Approximate DB CPU usage is close to 95% and this number goes for all jobs with files of more than 10k records.
Comparison with previous testing results Data Import test report (Poppy)
Duration for Data-import with PTF - Create 2 has not increased significantly. In the Quesnelia release, there were some changes in the update profile so the new one was created PTF - Updates Success - 6, these durations will be the baseline for the next tests.
Services CPU utilization, Service memory utilization, and DB CPU utilization have the same utilization trend and values as in the Poppy release.
Recommendations and Jiras
Investigate data-import job details failed to load
Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-1044
Results
Test # | Data-import test | Duration Poppy with R/W split enabled | Duration Quesnelia with R/W split enabled | Difference, % /sec | Results | ||||
---|---|---|---|---|---|---|---|---|---|
1. | 1k MARC BIB Create | PTF - Create 2 | 39 sec | 54 sec | ↓ 15 sec | Completed | |||
2. | 5k MARC BIB Create | PTF - Create 2 | 2 min 22 sec | 3 min 20 sec | ↓ 1 min 8 58 sec | Completed | |||
3. | 10k MARC BIB Create | PTF - Create 2 | 4 min 29 sec | 6 minutes | ↓ 1 min 31 sec | Completed | |||
4. | 25k MARC BIB Create | PTF - Create 2 | 10 min 38 sec | 13 min 41 sec | ↓ 3 min 3 sec | Completed | |||
5. | 50k MARC BIB Create | PTF - Create 2 | 20 min 26 sec | 21 min 59 sec | ↓ 1 min 33 sec | Completed | |||
6. | 100k MARC BIB Create | PTF - Create 2 | 2 hours 46 min Cancelled | 40 min 16 sec | Not applicable | Completed | |||
7. | 500k MARC BIB Create | PTF - Create 2 | Not testedTested | 3 hours 27 min | Not applicable | Completed | |||
8. | 1k MARC BIB Update | PTF - Updates Success - 6 | 34 sec (PTF - Updates Success - 1) | 34 sec1 min 59 sec | Not applicable | Completed | |||
9 | 2k MARC BIB Update | PTF - Updates Success - 16 | 1 min 09 sec (PTF - Updates Success - 1) | 2 min 43 sec | Not applicable | Completed | |||
10 | 5k MARC BIB Update | PTF - Updates Success - 16 | 2 min 31 sec | ↓ 6.66% / 17 sec | (PTF - Updates Success - 1) | 7 min 10 sec | Not applicable | Completed | |
11 | 10k MARC BIB Update | PTF - Updates Success - 16 | 5 min 13 sec ↓ 1.84% / 10 sec(PTF - Updates Success - 1) | 10 min 27 sec | Not applicable | Completed | |||
12 | 25k MARC BIB Update | PTF - Updates Success - 16 | 12 min 27 sec ↓ 14% / 105 sec(PTF - Updates Success - 1) | 23 min 16 sec | Not applicable | Completed | |||
13 | 25k 50k MARC BIB Update | PTF - Updates Success - 1 | 2 min 15 sec | 14 | 25k 6 | Not tested | 40 min 52 sec | Not applicable | Completed |
14 | 100k MARC BIB Update | PTF - Updates Success - 6 | Not tested | 1 | 12 min | hrs 2 min | Not applicable | Completed | |
15 | 500k MARC BIB Update | PTF - Updates Success - 6 | Not tested | 5 hrs 31 min | Not applicable | Completed |
Test Runs
MARC BIB CREATE
Tests #1-7 1k, 5k, 10k, 25k, 50k, 100k, 500k records
Data-import (Create) | start time | end time | |
---|---|---|---|
1 | 500k_bib_Create.mrc | 2024-04-01 09:56:59.095+00 | 2024-04-01 13:26:19.429+00 |
2 | 100k_bib_Create.mrc | 2024-04-01 09:03:56.04+00 | 2024-04-01 09:44:12.654+00 |
3 | 50k_bib_Create.mrc | 2024-04-01 08:18:58.078+00 | 2024-04-01 08:40:56.215+00 |
4 | 25k_bib_Create.mrc | 2024-04-01 07:58:48.679+00 | 2024-04-01 08:12:30.555+00 |
5 | 10k_bib_Create.mrc | 2024-04-01 07:47:09.388+00 | 2024-04-01 07:53:08.405+00 |
6 | 5k_bib_Create.mrc | 2024-04-01 07:40:32.282+00 | 2024-04-01 07:43:52.674+00 |
7 | 1k_bib_Create.mrc | 2024-04-01 07:38:30.511+00 | 2024-04-01 07:39:24.804+00 |
MARC BIB UPDATE
Tests #8-15 1k, 2k, 5k, 10k, 25k, 50k, 100k, 500k records
Data-import (Update) | start time | end time | |
---|---|---|---|
1 | DI_500K.mrc | 2024-04-08 10:55:52.476+00 | 2024-04-08 16:27:37.544+00 |
2 | DI_100K.mrc | 2024-04-08 09:44:16.428+00 | 2024-04-08 10:46:45.049+00 |
3 | DI_50K.mrc | 2024-04-08 08:55:22.263+00 | 2024-04-08 09:36:15.233+00 |
4 | DI_25K.mrc | 2024-04-08 08:24:13.083+00 | 2024-04-08 08:47:29.199+00 |
5 | DI_10K.mrc | 2024-04-08 08:10:05.748+00 | 2024-04-08 08:20:33.03+00 |
6 | DI_5K.mrc | 2024-04-08 08:01:35.162+00 | 2024-04-08 08:08:45.756+00 |
7 | DI_2K.mrc | 2024-04-08 07:56:12.21+00 | 2024-04-08 07:58:56.049+00 |
8 | DI_1K.mrc | 2024-04-08 07:47:59.217+00 | 2024-04-08 07:49:58.516+00 |
Service CPU Utilization
MARC BIB CREATE
...
CPU utilization for all modules came back to returned by default numbers after all tests. Average for mod-inventory-b - 130%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 40%, mod-source-record-manager-b - 35%, mod-di-converter-storage-b - 70%, , mod-data-import - 350% spike for 500k job(same behaviour on Poppy version).
MARC BIB UPDATE
Tests #8-1415
1k, 2k, 5k, 10k, 25k, 25k50k, 100k, 25k recordsAverage for mod-inventory-b - 220%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 50%, mod-source-record-manager-b - 45%, mod-di-converter-storage-b - 90%, , mod-data-import - 96% spike for 25k job.500k records
...
Memory Utilization
No memory leak is suspected for DI modules.
...
1k, 5k, 10k, 25k, 50k, 100k, 500k records
...
MARC BIB UPDATE
Tests #8-1415
1k, 2k, 5k, 10k, 25k, 50k, 25k100k, 25k 500k records
...
RDS CPU Utilization
MARC BIB CREATE
Average 95% for DI jobs with more than 10k records for Create and Update profiles
MARC BIB UPDATE
...
RDS Database Connections
MARC BIB CREATE
For DI job Create maximum 275 and for Update - 260 connections
When the SUT Data-Import job Create profile typically uses an average of 675 connections, whereas for Update profile, it uses around 640 connections. When the SAR number of the DB connection is about 500.
...
MARC BIB Update
...
Average active sessions (AAS)
MARC BIB CREATE
...
Top SQL
...
MARC BIB UPDATE
...
Top SQL
INSERT INTO fs09000000_mod_source_record_manager.events_processed
...
MSK CPU utilization (Percent) OpenSearch
...
CPU (User) usage by broker
MARC BIB Create
...
MARC BIB Update
The use of resources on Kafka clusters will not be valid, since testing was carried out on another environment, as well as one connected to the tenant cluster.
Errors
Appendix
Infrastructure
PTF -environment pcp1
...
Code Block |
---|
fields @timestamp, @message, @logStream, @log
############ DISABLED MODULES
| filter @logStream not like "mod-remote-storage" and @logStream not like "edge-caiasoft" and @logStream not like "mod-graphql" and @logStream not like "edge-ncip" and @logStream not like "mod-rtac"
| filter @logStream not like "mod-finance" and @logStream not like "mod-inn-reach"
############ DISABLED MODULES
| filter @message like "Error" or @message like "error" or @message like "ERROR"
#| filter @message like "Exception" or @message like "exception" or @message like "WARN"
############ mod-authtoken/
| filter @message not like "Invalid token" and @message not like "Token validation failure: Access token has expired" and @message not like "Access token has expired"
############ okapi-b , nginx-okapi
| filter @message not like "statusAny=ERROR" and @message not like "No suitable module found for path and tenant" and @message not like "Removing connection to endpoint" and @message not like "ErrorTypeException: 503:"
| filter @message not like "error_page"
############ mod-search
| filter @message not like "invokeBatchErrorHandler" and @message not like "ErrorHandlingUtils" and @message not like "FallbackBatchErrorHandler" and @message not like "DefaultErrorHandler" and @message not like "errorFromXContent"
############ mod-data-export-spring
| filter @message not like "java.lang.NoClassDefFoundError: io/vertx/core/Vertx" and @message not like "ERROR StatusConsoleListener Resolver failed to lookup FolioLoggingContext"
############ mod-inventory ---- Instances contains error in title
| filter @message not like "INFO cceedingTitlesHelper createPrecedingSucceedingTitles"
########### mod-pub-sub
| filter @message not like "TopicExistsException" and @message not like "Some of the topics"
########### OTHER ERRORS
| filter @message not like "Failed to index resource event"
| filter @message not like "FeignException.errorStatus" and @message not like "FeignException.clientErrorStatus" and @message not like "/var/log/nginx/error"
| filter @message not like "ErrorDecoder$Default.decode" and @message not like "InvocationContext.decodeError" and @message not like "Records indexed to elasticsearch"
| filter @message not like "HeapDumpOnOutOfMemoryError" and @message not like "while connecting to upstream" and @message not like "TopicExistsException" and @message not like "main ERROR Unrecognized"
| sort @timestamp desc
| limit 5000 |
Appendix
Infrastructure
11 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, writer/reader
NameMemory GIBvCPUsmax_connections
db.r6g.xlarge
32 GiB4 vCPUs2731
MSK tenant
4[Number of ECS instances, instance type, location region]
1 instance of db.r6.xlarge database instance: Writer instance
MSK - tenat
4 kafka.m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
auto.create.topics.enable=true
log.retention.minutes=480
default.replication.factor=3
Module | Task Def. Revision | Module Version | Task Count | Mem Hard Limit | Mem Soft limit | CPU units | Xmx | MetaspaceSize | MaxMetaspaceSize |
qcp1-pvt | |||||||||
mod-data-import | 5 | 1 | 2048 | 1844 | 256 | 1292 | 384 | 512 | |
mod-search | 2 | 2 | 2592 | 2480 | 2048 | 1440 | 512 | 1024 | |
mod-configuration | 2 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | |
mod-permissions | 4 | 2 | 1684 | 1544 | 512 | 1024 | 384 | 512 | |
mod-inventory-storage | 2 | 2 | 4096 | 3690 | 2048 | 3076 | 384 | 512 | |
mod-source-record-manager | 2 |
Methodology
...
2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | |||
okapi-b | 2 | 3 | 1684 | 1440 | 1024 | 922 | 384 | 512 |
Methodology
Pregenerated files were used for DI Create job profile
1K, 2K, 5K, 10K, 25K, 50K, 100K and 500K files.
Run DI Create on a single tenant one by one with the delay with files using PTF - Create 2 profile.
Prepare files for DI Update with the Data export app, using previously imported items
Run DI Update on a single tenant one by one with the delay with prepared files using PTF - Update Success 1 profile
...
6 profile
1K, 2K, 5K, 10K, 25K, 50K, 100K and 500K files.
Data-import durations were obtained from DB using SQL query
Code Block |
---|
SELECT (completed_date-started_date) as duration, * |
...
FROM fs09000000_mod_source_record_manager.job_execution |
...
Code Block |
---|
where subordination_type = 'COMPOSITE_PARENT'
order by started_date desc
limit 10 |