OAI-PMH data harvesting (10M inventory records, marc21 metadataPrefix)
Overview
Test goal is to assess performance of the harvesting of 10M inventory records with metadataPrefix set to marc21, find possible issues, bottlenecks.
Ticket: - PERF-145Getting issue details... STATUS
Summary
- Test lasted for 4 days and 6 hours.
- Average response time per request degrades from 1s to 20 s at the end of the test.
- Test finished after response time reached 20 seconds and 500 error was received.
- Query plans comparison showed that execution time of the query depends directly on the OFFSET parameter.
- No data consumptions issues were found.
Recommendations & Jiras
Issues to be reported:
- SLA issue: process time exceeds 6 hours requirement
- Response time degradation issue
- Internal server error recieved during the test
Ticket: - MODOAIPMH-464Getting issue details... STATUS
Test Runs
Test # | Test Conditions | Duration | Load generator size (recommended) | Load generator Memory(GiB) (recommended) | Notes |
1. |
| 4 days 6 hours | t3.medium | 3 | 2 test runs were conducted to reproduce the issues |
Results
Total requests sent during the test: 35 642 requests
Inventory records received: about 3,5M records (expected number – 10M records)
Response Times
Grafana dashboard:
Memory Utilization
CPU Utilization
Error messages
00:31:03 [503635/oai] [fs09000000] [2ac43d6a-3945-4d99-87ee-7a8281238572] [mod_oai_pmh] ERROR ractGetRecordsHelper Exception getting ListRecords.
00:31:03 [503635/oai] [fs09000000] [2ac43d6a-3945-4d99-87ee-7a8281238572] [mod_oai_pmh] ERROR ractGetRecordsHelper ListRecords response from SRS status code: Internal Server Error: 500.
00:31:03 [] [] [] [] ERROR OaiPmhHandler Error in the response from repository: status code - 500, response status message - Internal Server Error Internal Server Error
00:31:03 [165801/source-storage] [fs09000000] [2ac43d6a-3945-4d99-87ee-7a8281238572] [mod_source_record_storage] ERROR ExceptionHelper null
io.netty.channel.StacklessClosedChannelException: null
RDS CPU Utilization
Database load
Top SQL queries list
Top SQL query
with "cte" as (select count(*) from "records_lb" where (("records_lb"."state" = $1::"record_state" or "records_lb"."state" = $2::"record_state" or "records_lb"."leader_record_status" = $3 or "records_lb"."leader_record_status" = $4 or "records_lb"."leader_record_status" = $5) and "records_lb"."updated_date" <= cast($6 as timestamp with time zone) and "records_lb"."record_type" = $7::"record_type" and "records_lb"."leader_record_status" is not null)) select "records_lb"."id", "records_lb"."snapshot_id", "records_lb"."matched_id", "records_lb"."generation", "records_lb"."record_type", "records_lb"."external_id", "records_lb"."state", "records_lb"."leader_record_status", "records_lb"."order", "records_lb"."suppress_discovery", "records_lb"."created_by_user_id", "records_lb"."created_date", "records_lb"."updated_by_user_id", "records_lb"."updated_date", "records_lb"."external_hrid", "marc_records_lb"."content", "count" from "records_lb" left outer join "marc_records_lb" on "records_lb"."id" = "marc_records_lb"."id" right outer join (select * from "cte") as "alias_80949780" on 1 = 1 where (("records_lb"."state" = $8::"record_state" or "records_lb"."state" = $9::"record_state" or "records_lb"."leader_record_status" = $10 or "records_lb"."leader_record_status" = $11 or "records_lb"."leader_record_status" = $12) and "records_lb"."updated_date" <= cast($13 as timestamp with time zone) and "records_lb"."record_type" = $14::"record_type" and "records_lb"."leader_record_status" is not null) order by "records_lb"."id" asc limit $15 offset $16
Query plan comparison
Left side: date - 2023-01-07 11:00:01, offset – 2 220 000
Right side: date - 2023-01-09 22:41:01, offset – 3 526 100
Execution time - 13s (offset – 2 220 000), 21s (offset – 3 526 100).
Appendix
Infrastructure
PTF -environment ncp2 [
- 12 m6i.2xlarge EC2 instances located in us-east
- 2 instances of db.r6.xlarge database instances, one reader and one writer
- MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Modules memory and CPU parameters
Modules | Version | Task Definition | Running Tasks | CPU | Memory | MemoryReservation | MaxMetaspaceSize | Xmx |
---|---|---|---|---|---|---|---|---|
edge-oai-pmh | 2.5.1 | 1 | 2 | 1024 | 1512 | 1360 | 512m | 1440m |
mod-oai-pmh | 3.10.0 | 1 | 2 | 1024 | 2248 | 2000 | 512m | 1440m |
mod-inventory-storage | 25.0.1 | 1 | 2 | 1024 | 2208 | 1952 | 512m | 1440m |
mod-source-record-storage | 5.5.2 | 1 | 2 | 1024 | 1536 | 1440 | 512m | 908m |
mod-source-record-manager | 3.5.4 | 2 | 2 | 1024 | 4096 | 3688 | 512m | 2048m |
okapi | 4.14.7 | 1 | 3 | 1024 | 1684 | 1440 | 512m | 922m |
Methodology/Approach
Check that metadataPrefix is set to marc21 in script (oai_testm_mg_v3).
Run test from Jenkins or directly in VM.