OAI-PMH data harvesting (10M inventory records, marc21 metadataPrefix)

OAI-PMH data harvesting (10M inventory records, marc21 metadataPrefix)

Overview

Test goal is to assess performance of the harvesting of 10M inventory records with metadataPrefix set to marc21, find possible issues, bottlenecks.

Ticket: https://folio-org.atlassian.net/browse/PERF-145

Summary

  • Test lasted for 4 days and 6 hours.

  • Average response time per request degrades from 1s to 20 s at the end of the test.

  • Test finished after response time reached 20 seconds and 500 error was received.

  • Query plans comparison showed that execution time of the query depends directly on the OFFSET parameter.

  • No data consumptions issues were found.

Recommendations & Jiras

Issues to be reported:

  • SLA issue: process time exceeds 6 hours requirement

  • Response time degradation issue

  • Internal server error recieved during the test

Ticket: https://folio-org.atlassian.net/browse/MODOAIPMH-464

Test Runs 

Test #

Test Conditions

Duration 

Load generator size (recommended)

Load generator Memory(GiB) (recommended)

Notes

 

1.

  • 1 thread (virtual user)

  • 10 million inventory records processed

  • metadataPrefix set to marc21

  •  ncp2 environment

4 days 6 hours

t3.medium

3

2 test runs were conducted to reproduce the issues

 

Results

Total requests sent during the test: 35 642 requests

Inventory records received: about 3,5M records (expected number – 10M records)

 

Response Times 

Grafana dashboard:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&from=1671704123876&to=1672121479913&var-percentile=95&var-test_type=longevity&var-test=oai_testm_mg_v3&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=REQUEST

Memory Utilization

 

CPU Utilization 

Error messages

00:31:03 [503635/oai] [fs09000000] [2ac43d6a-3945-4d99-87ee-7a8281238572] [mod_oai_pmh] ERROR ractGetRecordsHelper Exception getting ListRecords.

00:31:03 [503635/oai] [fs09000000] [2ac43d6a-3945-4d99-87ee-7a8281238572] [mod_oai_pmh] ERROR ractGetRecordsHelper ListRecords response from SRS status code: Internal Server Error: 500.

00:31:03 [] [] [] [] ERROR OaiPmhHandler        Error in the response from repository: status code - 500, response status message - Internal Server Error Internal Server Error

00:31:03 [165801/source-storage] [fs09000000] [2ac43d6a-3945-4d99-87ee-7a8281238572] [mod_source_record_storage] ERROR ExceptionHelper      null

io.netty.channel.StacklessClosedChannelException: null

RDS CPU Utilization 

Database load

Top SQL queries list

Top SQL query

with "cte" as (select count(*) from "records_lb" where (("records_lb"."state" = $1::"record_state" or "records_lb"."state" = $2::"record_state" or "records_lb"."leader_record_status" = $3 or "records_lb"."leader_record_status" = $4 or "records_lb"."leader_record_status" = $5) and "records_lb"."updated_date" <= cast($6 as timestamp with time zone) and "records_lb"."record_type" = $7::"record_type" and "records_lb"."leader_record_status" is not null)) select "records_lb"."id", "records_lb"."snapshot_id", "records_lb"."matched_id", "records_lb"."generation", "records_lb"."record_type", "records_lb"."external_id", "records_lb"."state", "records_lb"."leader_record_status", "records_lb"."order", "records_lb"."suppress_discovery", "records_lb"."created_by_user_id", "records_lb"."created_date", "records_lb"."updated_by_user_id", "records_lb"."updated_date", "records_lb"."external_hrid", "marc_records_lb"."content", "count" from "records_lb" left outer join "marc_records_lb" on "records_lb"."id" = "marc_records_lb"."id" right outer join (select * from "cte") as "alias_80949780" on 1 = 1 where (("records_lb"."state" = $8::"record_state" or "records_lb"."state" = $9::"record_state" or "records_lb"."leader_record_status" = $10 or "records_lb"."leader_record_status" = $11 or "records_lb"."leader_record_status" = $12) and "records_lb"."updated_date" <= cast($13 as timestamp with time zone) and "records_lb"."record_type" = $14::"record_type" and "records_lb"."leader_record_status" is not null) order by "records_lb"."id" asc limit $15 offset $16

 

Query plan comparison

Left side: date - 2023-01-07 11:00:01, offset – 2 220 000

Right side: date - 2023-01-09 22:41:01, offset – 3 526 100

 

Execution time - 13s (offset – 2 220 000), 21s (offset – 3 526 100).

 

 

Appendix

Infrastructure

PTF -environment ncp2 [

  • 12 m6i.2xlarge EC2 instances located in us-east

  • 2 instances of db.r6.xlarge database instances, one reader and one writer 

  • MSK ptf-kakfa-3

    • 4 m5.2xlarge brokers in 2 zones

    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true

    • log.retention.minutes=480

    • default.replication.factor=3

 

Modules memory and CPU parameters

Modules

Version

Task Definition

Running Tasks 

CPU

Memory

MemoryReservation

MaxMetaspaceSize

Xmx

Modules

Version

Task Definition

Running Tasks 

CPU

Memory

MemoryReservation

MaxMetaspaceSize

Xmx

edge-oai-pmh

2.5.1

1

2

1024

1512

1360

512m

1440m

mod-oai-pmh

3.10.0

1

2

1024

2248

2000

512m

1440m

mod-inventory-storage

25.0.1

1

2

1024

2208

1952

512m

1440m

mod-source-record-storage

5.5.2

1

2

1024

1536

1440

512m

908m

mod-source-record-manager

3.5.4

2

2

1024

4096

3688

512m

2048m

okapi

4.14.7

1

3

1024

1684

1440

512m

922m

Methodology/Approach

Check that metadataPrefix is set to marc21 in script (oai_testm_mg_v3).

Run test from Jenkins or directly in VM.