Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
Overview

This document contains the results of testing Data Import Create MARC holdings records in Orchid to detect performance trends.

Ticket: Image RemovedPERF-474 - [Orchid] [Data Import] Marc HOLDINGS IN PROGRESS

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-474

Previous test report: Data Import Create MARC holdings records [Nolana]

Summary

  • Data import jobs duration significantly increased compared to the Nolana results, see TestResults
  • Maximum CPU utilization was observed for mod-inventory-storage module during 80k DI job - 88%, see ServiceCPUUtilization.
  • There is a significant memory utilization increase for some modules during 10k and 80k DI jobs, see MemoryUtilization
  • Maximum service memory utilization reached 97% for mod-inventory-storage during 80k DI job, see MemoryUtilization.
  • During 80k DI job there is a switch of instrances for reader and writer role. This might happen because of overload and restart of the writer instance, see DBCPUUtilization.

It should be noted that trigger process_marc_records_lb_insert_update_trigger was enabled for mod_source_record_storage.marc_records_lb table to make the environment production-like. It could influence the results in current release. At the same time tests showed that DI jobs duration with and without the trigger stays the same.

Recommendations & Jiras

Issue with failing 80K DI jobs was reproduced in 100% (tested several times). This issues requires further investigation.

Ticket:

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODSOURCE-687

Test Results

Profile used for testing - Default - Create Holdings and SRS MARC Holdings

Test

File

Duration: Morning_Glory

Duration: NolanaDuration: OrchidDiff_absoluteDiff_percentage

1

1k28s32s45s13s+28%40%
25k1m 48s4m 20s7m 47s3m 27s+79%
310k4m 4s3m 24s19m 46s16m 22s+481%
480k29m 6s21m 22s20m (error*, 65K discarded)

...

)--

*65K items were discarded, see AdditionalInformation

Instance CPU Utilization

Image AddedImage Added

Service CPU Utilization

Top services by CPU utilization (during 80k DI):

  • mod-inventory-storage - 88%
  • mod-inn-reach - 42%
  • mod-inventory - 15%

Image Added

Image Added

Image Added

Image Added

Memory Utilization

Top services by used memory increase (during 10k and 80k DI):

  • mod-inventory-storage - from 11% to 79%
  • mod-source-record-manager - from 33% to 61%
  • mod-source-record-storage - from 37% to 63%
  • mod-data-import - from 22% to 35%

Image Added

Image Added

Image AddedImage Added


mod-inventory-storage memory utilizatoin (maximum) during 80k DI job:Image Added

DB CPU Utilization

During 80k DI job there is a switch of instrances for reader and writer role. This might happen because of overload and restart of the writer instance.

Image Added

Image Added

DB Connections

Image Added

Image Added

DB Load

Image AddedImage Added

SQL queries

Top-SQL statement:

SELECT jsonb,id FROM fs09000000_mod_inventory_storage.instance_holdings_item_view WHERE id='947de548-2c4b-443a-8b2d-0bfac571430b' LIMIT 1 OFFSET 0

1k and 5k:

Image Added


10k and 80k:

Image Added

Additional Information

Log messages during 80k DI job failure:

09:48:56 [937501/holdings-storage] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_inventory_storage] ERROR dpointFailureHandler An error occurred io.vertx.core.impl.NoStackTraceThrowable: Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432

09:48:56 [937501/holdings-storage] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_inventory_storage] INFO  LogUtil     127.0.0.1:50354 PUT /holdings-storage/holdings/77a4fa76-091d-4429-a8a0-803f75276bf7 null HTTP_1_1 500 60 30000 tid=fs09000000 Internal Server Error

09:48:56 [576440/holdings-sources] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_inventory_storage] ERROR PgUtil        Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432

09:48:56 [295007/inventory-view] [fs09000000] [d28a1154-3939-46bc-a114-e03c62aeafe5] [mod_inventory_storage] ERROR PostgresClient   Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432

Error messages retrieved from the database (during 80k DI job):

  • Connection is closed
  • io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: db.ncp5.folio-eis.us-east-1/10.23.34.172:5432
  • io.netty.channel.StacklessClosedChannelException
  • io.vertx.core.impl.NoStackTraceThrowable: Couldn't update JobExecution status, JobExecution already marked as ERROR
  • io.vertx.core.impl.NoStackTraceThrowable: Fail to read any response from the server, the underlying connection might get lost unexpectedly.
  • io.vertx.core.impl.NoStackTraceThrowable: Timeout
  • io.vertx.core.impl.NoStackTraceThrowable: Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
  • io.vertx.pgclient.PgException: FATAL: the database system is in recovery mode (57P03)
  • java.io.IOException: Connection reset by peer
  • org.folio.processing.exceptions.EventProcessingException: 504 Gateway Time-out
  • org.folio.processing.exceptions.EventProcessingException: Invalid token: Unexpected user response code 500 for user id 9eb67301-6f6e-468f-9b1a-6134dc39a684
  • org.folio.processing.exceptions.EventProcessingException: Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
  • org.folio.processing.exceptions.EventProcessingException: Unable to retrieve permissions for user with id '9eb67301-6f6e-468f-9b1a-6134dc39a684': Unable to retrieve permissions (code 400): Timeout for DB_HOST:DB_PORT=db.ncp5.folio-eis.us-east-1:5432
  • This connection has been closed.
  • Timeout

Instance CPU Utilization

Image RemovedImage Removed

Service CPU Utilization

Image Removed

Image Removed

Image Removed

Image Removed

Memory Utilization

Image Removed

Image Removed

Image RemovedImage Removed

DB CPU Utilization

Image Removed

Image Removed

DB Connections

Image Removed

Image Removed

DB Load

Image RemovedImage Removed

SQL queries

Image Removed

Image Removed

nfrastructure

Infrastructure

PTF -environment ncp5 

  • 11 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1  
  • 2 database  instances, one reader, and one writer

    NameAPI NameMemory GIBvCPUsmax_connections
    R6G Extra Largedb.r6g.xlarge32 GiB4 vCPUs2731


  • number of connections for mod-source-record-manager and mod-source-record-storage - 30 connections.
  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
  • Kafka topics partitioning: - 2 partitions for DI topics


Modules memory and CPU parameters

Modules

Version

Task Definition

Running Tasks 

CPU

Memory

MemoryReservation

MaxMetaspaceSize

Xmx

mod-inventory-storage26.0.0122204840963690
512
3076
mod-inventory20.0.61221024288025925121814
mod-source-record-storage5.6.72722048560050005123500
mod-quick-marc3.0.061128228821765121664
mod-source-record-manager3.6.41822048560050005123500
mod-di-converter-storage2.0.5621281024896128768
mod-data-import2.7.1101256204818445121292
okapi5.0.183102416841440512922
nginx-okapi2022.06.14721281024896--
pub-okapi2022.06.14721281024896-768