Data Import Create MARC holdings records [Orchid]
Overview
This document contains the results of testing Data Import Create MARC holdings records in Orchid to detect performance trends.
Ticket: - PERF-474Getting issue details... STATUS
Previous test report: Data Import Create MARC holdings records [Nolana]
Summary
- Data import jobs duration significantly increased compared to the Nolana results, see TestResults.
- Maximum CPU utilization was observed for mod-inventory-storage module during 80k DI job - 88%, see ServiceCPUUtilization.
- There is a significant memory utilization increase for some modules during 10k and 80k DI jobs, see MemoryUtilization.
- Maximum service memory utilization reached 97% for mod-inventory-storage during 80k DI job, see MemoryUtilization.
- During 80k DI job there is a switch of instrances for reader and writer role. This might happen because of overload and restart of the writer instance, see DBCPUUtilization.
It should be noted that trigger process_marc_records_lb_insert_update_trigger was enabled for mod_source_record_storage.marc_records_lb table to make the environment production-like. It could influence the results in current release. At the same time tests showed that DI jobs duration with and without the trigger stays the same.
Recommendations & Jiras
Issue with failing 80K DI jobs was reproduced in 100% (tested several times). This issues requires further investigation.
Ticket: - MODSOURCE-687Getting issue details... STATUS
Test Results
Profile used for testing - Default - Create Holdings and SRS MARC Holdings
Test | File | Duration: Morning_Glory | Duration: Nolana | Duration: Orchid | Diff_absolute | Diff_percentage |
---|---|---|---|---|---|---|
1 | 1k | 28s | 32s | 45s | 13s | +40% |
2 | 5k | 1m 48s | 4m 20s | 7m 47s | 3m 27s | +79% |
3 | 10k | 4m 4s | 3m 24s | 19m 46s | 16m 22s | +481% |
4 | 80k | 29m 6s | 21m 22s | 20m (error*) | - | - |
*65K items were discarded, see AdditionalInformation
Instance CPU Utilization
Service CPU Utilization
Top services by CPU utilization (during 80k DI):
- mod-inventory-storage - 88%
- mod-inn-reach - 42%
- mod-inventory - 15%
Memory Utilization
Top services by used memory increase (during 10k and 80k DI):
- mod-inventory-storage - from 11% to 79%
- mod-source-record-manager - from 33% to 61%
- mod-source-record-storage - from 37% to 63%
- mod-data-import - from 22% to 35%
mod-inventory-storage memory utilizatoin (maximum) during 80k DI job:
DB CPU Utilization
During 80k DI job there is a switch of instrances for reader and writer role. This might happen because of overload and restart of the writer instance.
DB Connections
DB Load
SQL queries
Top-SQL statement:
SELECT jsonb,id FROM fs09000000_mod_inventory_storage.instance_holdings_item_view WHERE id='947de548-2c4b-443a-8b2d-0bfac571430b' LIMIT 1 OFFSET 0
1k and 5k:
10k and 80k:
Additional Information
Log messages during 80k DI job failure:
09:48:56 [937501/holdings-storage] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_inventory_storage] ERROR dpointFailureHandler An error occurred io.vertx.core.impl.NoStackTraceThrowable: Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
09:48:56 [937501/holdings-storage] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_inventory_storage] INFO LogUtil 127.0.0.1:50354 PUT /holdings-storage/holdings/77a4fa76-091d-4429-a8a0-803f75276bf7 null HTTP_1_1 500 60 30000 tid=fs09000000 Internal Server Error
09:48:56 [576440/holdings-sources] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_inventory_storage] ERROR PgUtil Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
09:48:56 [295007/inventory-view] [fs09000000] [d28a1154-3939-46bc-a114-e03c62aeafe5] [mod_inventory_storage] ERROR PostgresClient Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
Error messages retrieved from the database (during 80k DI job):
- Connection is closed
- io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: db.ncp5.folio-eis.us-east-1/10.23.34.172:5432
- io.netty.channel.StacklessClosedChannelException
- io.vertx.core.impl.NoStackTraceThrowable: Couldn't update JobExecution status, JobExecution already marked as ERROR
- io.vertx.core.impl.NoStackTraceThrowable: Fail to read any response from the server, the underlying connection might get lost unexpectedly.
- io.vertx.core.impl.NoStackTraceThrowable: Timeout
- io.vertx.core.impl.NoStackTraceThrowable: Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
- io.vertx.pgclient.PgException: FATAL: the database system is in recovery mode (57P03)
- java.io.IOException: Connection reset by peer
- org.folio.processing.exceptions.EventProcessingException: 504 Gateway Time-out
- org.folio.processing.exceptions.EventProcessingException: Invalid token: Unexpected user response code 500 for user id 9eb67301-6f6e-468f-9b1a-6134dc39a684
- org.folio.processing.exceptions.EventProcessingException: Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
- org.folio.processing.exceptions.EventProcessingException: Unable to retrieve permissions for user with id '9eb67301-6f6e-468f-9b1a-6134dc39a684': Unable to retrieve permissions (code 400): Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
- This connection has been closed.
- Timeout
Infrastructure
PTF -environment ncp5
- 11 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, one reader, and one writer
Name API Name Memory GIB vCPUs max_connections R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731 - number of connections for mod-source-record-manager and mod-source-record-storage - 30 connections.
- MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
- Kafka topics partitioning: - 2 partitions for DI topics
Modules memory and CPU parameters
Modules | Version | Task Definition | Running Tasks | CPU | Memory | MemoryReservation | MaxMetaspaceSize | Xmx |
---|---|---|---|---|---|---|---|---|
mod-inventory-storage | 26.0.0 | 12 | 2 | 2048 | 4096 | 3690 | 512 | 3076 |
mod-inventory | 20.0.6 | 12 | 2 | 1024 | 2880 | 2592 | 512 | 1814 |
mod-source-record-storage | 5.6.7 | 27 | 2 | 2048 | 5600 | 5000 | 512 | 3500 |
mod-quick-marc | 3.0.0 | 6 | 1 | 128 | 2288 | 2176 | 512 | 1664 |
mod-source-record-manager | 3.6.4 | 18 | 2 | 2048 | 5600 | 5000 | 512 | 3500 |
mod-di-converter-storage | 2.0.5 | 6 | 2 | 128 | 1024 | 896 | 128 | 768 |
mod-data-import | 2.7.1 | 10 | 1 | 256 | 2048 | 1844 | 512 | 1292 |
okapi | 5.0.1 | 8 | 3 | 1024 | 1684 | 1440 | 512 | 922 |
nginx-okapi | 2022.06.14 | 7 | 2 | 128 | 1024 | 896 | - | - |
pub-okapi | 2022.06.14 | 7 | 2 | 128 | 1024 | 896 | - | 768 |