Overview

This document contains the results of testing Data Import Create MARC holdings records in Orchid to detect performance trends.

Ticket:

Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Previous test report: Data Import Create MARC holdings records [Nolana]

Summary

Data import jobs duration significantly increased compared to the Nolana results, see TestResults. It can be caused by added index on the ncp5 environment on Orchid release to make it production like.
Maximum CPU utilization was observed for mod-inventory-storage module during 80k DI job - 88%, see ServiceCPUUtilization.
There is a significant memory utilization increase for some modules during 10k and 80k DI jobs, see MemoryUtilization.
Maximum service memory utilization reached 97% for mod-inventory-storage during 80k DI job, see MemoryUtilization.
During 80k DI job there is a switch of instrances for reader and writer role. This might happen because of overload and restart of the writer instance, see DBCPUUtilization.

Test Results

Test	File	Duration: Morning_Glory	Duration: Nolana	Duration: Orchid	Diff_absolute	Diff_percentage
1	1k	28s	32s	45s	13s	+40%
2	5k	1m 48s	4m 20s	7m 47s	3m 27s	+79%
3	10k	4m 4s	3m 24s	19m 46s	16m 22s	+481%
4	80k	29m 6s	21m 22s	20m (error*)	-	-

*65K items were discarded, see AdditionalInformation

Instance CPU Utilization

Service CPU Utilization

Top services by CPU utilization (during 80k DI):

mod-inventory-storage - 88%
mod-inn-reach - 42%
mod-inventory - 15%

Memory Utilization

Top services by used memory increase (during 10k and 80k DI):

mod-inventory-storage - from 11% to 79%
mod-source-record-manager - from 33% to 61%
mod-source-record-storage - from 37% to 63%
mod-data-import - from 22% to 35%

mod-inventory-storage memory utilizatoin (maximum) during 80k DI job:

DB CPU Utilization

During 80k DI job there is a switch of instrances for reader and writer role. This might happen because of overload and restart of the writer instance.

DB Connections

DB Load

SQL queries

Top-SQL statement:

SELECT jsonb,id FROM fs09000000_mod_inventory_storage.instance_holdings_item_view WHERE id='947de548-2c4b-443a-8b2d-0bfac571430b' LIMIT 1 OFFSET 0

1k and 5k:

10k and 80k:

Additional Information

Log messages during 80k DI job failure:

09:48:56 [937501/holdings-storage] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_inventory_storage] ERROR dpointFailureHandler An error occurred io.vertx.core.impl.NoStackTraceThrowable: Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432

09:48:56 [937501/holdings-storage] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_inventory_storage] INFO LogUtil 127.0.0.1:50354 PUT /holdings-storage/holdings/77a4fa76-091d-4429-a8a0-803f75276bf7 null HTTP_1_1 500 60 30000 tid=fs09000000 Internal Server Error

09:48:56 [576440/holdings-sources] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_inventory_storage] ERROR PgUtil Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432

09:48:56 [295007/inventory-view] [fs09000000] [d28a1154-3939-46bc-a114-e03c62aeafe5] [mod_inventory_storage] ERROR PostgresClient Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432

Error messages retrieved from the database (during 80k DI job):

Connection is closed
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: db.ncp5.folio-eis.us-east-1/10.23.34.172:5432
io.netty.channel.StacklessClosedChannelException
io.vertx.core.impl.NoStackTraceThrowable: Couldn't update JobExecution status, JobExecution already marked as ERROR
io.vertx.core.impl.NoStackTraceThrowable: Fail to read any response from the server, the underlying connection might get lost unexpectedly.
io.vertx.core.impl.NoStackTraceThrowable: Timeout
io.vertx.core.impl.NoStackTraceThrowable: Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
io.vertx.pgclient.PgException: FATAL: the database system is in recovery mode (57P03)
java.io.IOException: Connection reset by peer
org.folio.processing.exceptions.EventProcessingException: 504 Gateway Time-out
org.folio.processing.exceptions.EventProcessingException: Invalid token: Unexpected user response code 500 for user id 9eb67301-6f6e-468f-9b1a-6134dc39a684
org.folio.processing.exceptions.EventProcessingException: Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
org.folio.processing.exceptions.EventProcessingException: Unable to retrieve permissions for user with id '9eb67301-6f6e-468f-9b1a-6134dc39a684': Unable to retrieve permissions (code 400): Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
This connection has been closed.
Timeout

Infrastructure

PTF -environment ncp5

11 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, one reader, and one writer
Name API Name Memory GIB vCPUs max_connections
R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731
MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
- Apache Kafka version 2.8.0
- EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Kafka topics partitioning: - 2 partitions for DI topics

Modules memory and CPU parameters

Modules	Version	Task Definition	Running Tasks	CPU	Memory	MemoryReservation	MaxMetaspaceSize	Xmx
mod-inventory-storage	26.0.0	12	2	2048	4096	3690	512	3076
mod-inventory	20.0.6	12	2	1024	2880	2592	512	1814
mod-source-record-storage	5.6.7	27	2	2048	5600	5000	512	3500
mod-quick-marc	3.0.0	6	1	128	2288	2176	512	1664
mod-source-record-manager	3.6.4	18	2	2048	5600	5000	512	3500
mod-di-converter-storage	2.0.5	6	2	128	1024	896	128	768
mod-data-import	2.7.1	10	1	256	2048	1844	512	1292
okapi	5.0.1	8	3	1024	1684	1440	512	922
nginx-okapi	2022.06.14	7	2	128	1024	896	-	-
pub-okapi	2022.06.14	7	2	128	1024	896	-	768

Name	API Name	Memory GIB	vCPUs	max_connections
R6G Extra Large	db.r6g.xlarge	32 GiB	4 vCPUs	2731

Data Import Create MARC holdings records [Orchid]