Overview
This document contains the results of testing Data Import Create MARC holdings records in Orchid to detect performance trends.
Ticket:
Previous test report: Data Import Create MARC holdings records [Nolana]
Summary
- Data import jobs duration significantly increased compared to the Nolana results, see TestResults. It can be caused by added index on the ncp5 environment on Orchid release to make it production like.
- Maximum CPU utilization was observed for mod-inventory-storage module during 80k DI job - 88%, see ServiceCPUUtilization.
- There is a significant memory utilization increase for some modules during 10k and 80k DI jobs, see MemoryUtilization.
- Maximum service memory utilization reached 97% for mod-inventory-storage during 80k DI job, see MemoryUtilization.
- During 80k DI job there is a switch of instrances for reader and writer role. This might happen because of overload and restart of the writer instance, see DBCPUUtilization.
Test Results
Test | File | Duration: Morning_Glory | Duration: Nolana | Duration: Orchid | Diff_absolute | Diff_percentage |
---|---|---|---|---|---|---|
1 | 1k | 28s | 32s | 45s | 13s | +40% |
2 | 5k | 1m 48s | 4m 20s | 7m 47s | 3m 27s | +79% |
3 | 10k | 4m 4s | 3m 24s | 19m 46s | 16m 22s | +481% |
4 | 80k | 29m 6s | 21m 22s | 20m (error*) | - | - |
*65K items were discarded, see AdditionalInformation
Instance CPU Utilization
Service CPU Utilization
Top services by CPU utilization (during 80k DI):
- mod-inventory-storage - 88%
- mod-inn-reach - 42%
- mod-inventory - 15%
Memory Utilization
Top services by used memory increase (during 10k and 80k DI):
- mod-inventory-storage - from 11% to 79%
- mod-source-record-manager - from 33% to 61%
- mod-source-record-storage - from 37% to 63%
- mod-data-import - from 22% to 35%
mod-inventory-storage memory utilizatoin (maximum) during 80k DI job:
DB CPU Utilization
During 80k DI job there is a switch of instrances for reader and writer role. This might happen because of overload and restart of the writer instance.
DB Connections
DB Load
SQL queries
Top-SQL statement:
SELECT jsonb,id FROM fs09000000_mod_inventory_storage.instance_holdings_item_view WHERE id='947de548-2c4b-443a-8b2d-0bfac571430b' LIMIT 1 OFFSET 0
1k and 5k:
10k and 80k:
Additional Information
Log messages during 80k DI job failure:
09:48:56 [937501/holdings-storage] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_inventory_storage] ERROR dpointFailureHandler An error occurred io.vertx.core.impl.NoStackTraceThrowable: Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
09:48:56 [937501/holdings-storage] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_inventory_storage] INFO LogUtil 127.0.0.1:50354 PUT /holdings-storage/holdings/77a4fa76-091d-4429-a8a0-803f75276bf7 null HTTP_1_1 500 60 30000 tid=fs09000000 Internal Server Error
09:48:56 [576440/holdings-sources] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_inventory_storage] ERROR PgUtil Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
09:48:56 [295007/inventory-view] [fs09000000] [d28a1154-3939-46bc-a114-e03c62aeafe5] [mod_inventory_storage] ERROR PostgresClient Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
Error messages retrieved from the database (during 80k DI job):
- Connection is closed
- io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: db.ncp5.folio-eis.us-east-1/10.23.34.172:5432
- io.netty.channel.StacklessClosedChannelException
- io.vertx.core.impl.NoStackTraceThrowable: Couldn't update JobExecution status, JobExecution already marked as ERROR
- io.vertx.core.impl.NoStackTraceThrowable: Fail to read any response from the server, the underlying connection might get lost unexpectedly.
- io.vertx.core.impl.NoStackTraceThrowable: Timeout
- io.vertx.core.impl.NoStackTraceThrowable: Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
- io.vertx.pgclient.PgException: FATAL: the database system is in recovery mode (57P03)
- java.io.IOException: Connection reset by peer
- org.folio.processing.exceptions.EventProcessingException: 504 Gateway Time-out
- org.folio.processing.exceptions.EventProcessingException: Invalid token: Unexpected user response code 500 for user id 9eb67301-6f6e-468f-9b1a-6134dc39a684
- org.folio.processing.exceptions.EventProcessingException: Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
- org.folio.processing.exceptions.EventProcessingException: Unable to retrieve permissions for user with id '9eb67301-6f6e-468f-9b1a-6134dc39a684': Unable to retrieve permissions (code 400): Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
- This connection has been closed.
- Timeout
Infrastructure
PTF -environment ncp5
- 11 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, one reader, and one writer
Name API Name Memory GIB vCPUs max_connections R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731 - MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
- Kafka topics partitioning: - 2 partitions for DI topics
Modules memory and CPU parameters
Modules | Version | Task Definition | Running Tasks | CPU | Memory | MemoryReservation | MaxMetaspaceSize | Xmx |
---|---|---|---|---|---|---|---|---|
mod-inventory-storage | 26.0.0 | 12 | 2 | 2048 | 4096 | 3690 | 512 | 3076 |
mod-inventory | 20.0.6 | 12 | 2 | 1024 | 2880 | 2592 | 512 | 1814 |
mod-source-record-storage | 5.6.7 | 27 | 2 | 2048 | 5600 | 5000 | 512 | 3500 |
mod-quick-marc | 3.0.0 | 6 | 1 | 128 | 2288 | 2176 | 512 | 1664 |
mod-source-record-manager | 3.6.4 | 18 | 2 | 2048 | 5600 | 5000 | 512 | 3500 |
mod-di-converter-storage | 2.0.5 | 6 | 2 | 128 | 1024 | 896 | 128 | 768 |
mod-data-import | 2.7.1 | 10 | 1 | 256 | 2048 | 1844 | 512 | 1292 |
okapi | 5.0.1 | 8 | 3 | 1024 | 1684 | 1440 | 512 | 922 |
nginx-okapi | 2022.06.14 | 7 | 2 | 128 | 1024 | 896 | - | - |
pub-okapi | 2022.06.14 | 7 | 2 | 128 | 1024 | 896 | - | 768 |