OAI-PMH data harvesting (10M inventory records, marc21 metadataPrefix)

Overview

Test goal is to assess performance of the harvesting of 10M inventory records with metadataPrefix set to marc21, find possible issues, bottlenecks.

Ticket: PERF-145 - Getting issue details... STATUS

Summary

  • Test lasted for 4 days and 6 hours.
  • Average response time per request degrades from 1s to 20 s at the end of the test.
  • Test finished after response time reached 20 seconds and 500 error was received.
  • Query plans comparison showed that execution time of the query depends directly on the OFFSET parameter.
  • No data consumptions issues were found.

Recommendations & Jiras

Issues to be reported:

  • SLA issue: process time exceeds 6 hours requirement
  • Response time degradation issue
  • Internal server error recieved during the test

Ticket: MODOAIPMH-464 - Getting issue details... STATUS

Test Runs 

Test #

Test Conditions

Duration 

Load generator size (recommended)Load generator Memory(GiB) (recommended)

Notes


1.

  • 1 thread (virtual user)
  • 10 million inventory records processed
  • metadataPrefix set to marc21
  •  ncp2 environment
4 days 6 hourst3.medium32 test runs were conducted to reproduce the issues


Results

Total requests sent during the test: 35 642 requests

Inventory records received: about 3,5M records (expected number – 10M records)


Response Times 

Grafana dashboard:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&from=1671704123876&to=1672121479913&var-percentile=95&var-test_type=longevity&var-test=oai_testm_mg_v3&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=REQUEST

Memory Utilization


CPU Utilization 

Error messages

00:31:03 [503635/oai] [fs09000000] [2ac43d6a-3945-4d99-87ee-7a8281238572] [mod_oai_pmh] ERROR ractGetRecordsHelper Exception getting ListRecords.

00:31:03 [503635/oai] [fs09000000] [2ac43d6a-3945-4d99-87ee-7a8281238572] [mod_oai_pmh] ERROR ractGetRecordsHelper ListRecords response from SRS status code: Internal Server Error: 500.

00:31:03 [] [] [] [] ERROR OaiPmhHandler        Error in the response from repository: status code - 500, response status message - Internal Server Error Internal Server Error

00:31:03 [165801/source-storage] [fs09000000] [2ac43d6a-3945-4d99-87ee-7a8281238572] [mod_source_record_storage] ERROR ExceptionHelper      null

io.netty.channel.StacklessClosedChannelException: null

RDS CPU Utilization 

Database load

Top SQL queries list

Top SQL query

with "cte" as (select count(*) from "records_lb" where (("records_lb"."state" = $1::"record_state" or "records_lb"."state" = $2::"record_state" or "records_lb"."leader_record_status" = $3 or "records_lb"."leader_record_status" = $4 or "records_lb"."leader_record_status" = $5) and "records_lb"."updated_date" <= cast($6 as timestamp with time zone) and "records_lb"."record_type" = $7::"record_type" and "records_lb"."leader_record_status" is not null)) select "records_lb"."id", "records_lb"."snapshot_id", "records_lb"."matched_id", "records_lb"."generation", "records_lb"."record_type", "records_lb"."external_id", "records_lb"."state", "records_lb"."leader_record_status", "records_lb"."order", "records_lb"."suppress_discovery", "records_lb"."created_by_user_id", "records_lb"."created_date", "records_lb"."updated_by_user_id", "records_lb"."updated_date", "records_lb"."external_hrid", "marc_records_lb"."content", "count" from "records_lb" left outer join "marc_records_lb" on "records_lb"."id" = "marc_records_lb"."id" right outer join (select * from "cte") as "alias_80949780" on 1 = 1 where (("records_lb"."state" = $8::"record_state" or "records_lb"."state" = $9::"record_state" or "records_lb"."leader_record_status" = $10 or "records_lb"."leader_record_status" = $11 or "records_lb"."leader_record_status" = $12) and "records_lb"."updated_date" <= cast($13 as timestamp with time zone) and "records_lb"."record_type" = $14::"record_type" and "records_lb"."leader_record_status" is not null) order by "records_lb"."id" asc limit $15 offset $16


Query plan comparison

Left side: date - 2023-01-07 11:00:01, offset – 2 220 000

Right side: date - 2023-01-09 22:41:01, offset – 3 526 100


Execution time - 13s (offset – 2 220 000), 21s (offset – 3 526 100).



Appendix

Infrastructure

PTF -environment ncp2 [

  • 12 m6i.2xlarge EC2 instances located in us-east
  • 2 instances of db.r6.xlarge database instances, one reader and one writer 
  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3


Modules memory and CPU parameters

Modules

Version

Task Definition

Running Tasks 

CPU

Memory

MemoryReservation

MaxMetaspaceSize

Xmx

edge-oai-pmh

2.5.112102415121360512m1440m

mod-oai-pmh

3.10.0121024

2248

2000

512m1440m

mod-inventory-storage

25.0.112102422081952512m1440m

mod-source-record-storage

5.5.212102415361440512m908m

mod-source-record-manager

3.5.422102440963688512m2048m

okapi

4.14.713102416841440512m922m

Methodology/Approach

Check that metadataPrefix is set to marc21 in script (oai_testm_mg_v3).

Run test from Jenkins or directly in VM.