OAI-PMH data harvesting [KIWI]
Overview
The purpose of these set of tests is to measure performance of Kiwi release. Find possible issues, bottlenecks. - PERF-198Getting issue details... STATUS
Environment
Software versions (Test 1-2)
- mod-oai-pmh:3.7.0-SNAPSHOT.188.
- edge-oai-pmh:2.4.0
- mod-source-record-manager:3.2.3
- mod-source-record-storage:5.2.1
- mod-inventory-storage:22.0.1
- okapi:4.9.0
Original PTF dataset containing 1,212,039 underlying MARC records for 8.7M instances
Software versions (Test 3 with Bugfest Dataset)
- mod-oai-pmh:3.6.1
- edge-oai-pmh:2.4.0
- mod-source-record-manager:3.2.6
- mod-source-record-storage-5.2.5
- mod-inventory-storage-22.0.3
- okapi:4.9.0
Bugfest dataset containing 8,034,444 underlying MARC records for 8.3M instances
Summary
- Kiwi release was able ho harvest 7,808,200 records in 19 hr 8 min (1M records per 2 hours and 15 min).
- Average response time per request with resumption token 0.874s.
- No memory or CPU issues were found (after the first couple of JIRAs below had been fixed)
- KPIs:
- mod-oai-pmh CPU usage 120% (on data transferring) 100% on harvesting.
- RDS CPU usage 80% on data transferring and ±15 % on harvesting
- Memory usage 105-107% on mod-source-record-manager. 35% on mod-oai-pmh. No signs of memory leaks on related modules.
- A few issues were found
- OutOfMemory exception: - MODOAIPMH-374Getting issue details... STATUS
- Thread block issue: - MODOAIPMH-374Getting issue details... STATUS
- When instances didn't have underlying MARC records, multiple repeating calls from mod-edge-oai-pmh to mod-oai-pmh were occurred, resulting in the end-client receiving an timeout, see - MODOAIPMH-383Getting issue details... STATUS
Test flow
Test consist of few calls:
Initial call that was performed only once
/oai/records?verb=ListRecords&metadataPrefix=marc21_withholdings&apikey=[APIKey]
Subsequent harvesting calls:
/oai/records?verb=ListRecords&apikey=[APIKey]&resumptionToken=[resumptionToken]
These calls were performed repeatedly, harvesting 100 records each time until there is no more data in [tenant]_mod_oai_pmh.instances table to harvest.
[resumptionToken] was set to 100, returning in initial call response and in each harvesting call until there is no more records to harvest. When all data has being harvested - resumptionToken will not return with the response.
Issues detected during testing
1) OutOfMemory exception. fixed in scope of - MODOAIPMH-374Getting issue details... STATUS
2) Thread block issue. fixed in scope of - MODOAIPMH-374Getting issue details... STATUS
3) Client timeouts. - MODOAIPMH-383Getting issue details... STATUS New issue appearing when we're starting DB transferring and harvesting process at the same time. It's leads to high load on DB and it responding with timeout
2021-12-01T10:02:42,566 ERROR [vert.x-eventloop-thread-0] MarcWithHoldingsRequestHelper Save instance Ids failed: Timeout. |
io.vertx.core.impl.NoStackTraceThrowable: Timeout |
Test results
Test 1
- Total Underlying SRS records: 1,212,039
- Duration: 4 hr 57 min
- Records transferred: 4,770,043 (should be 8,415,303)
- Calls performed 20,618
We can see here unstable part of test. This spikes on chart showing extremely increased response times. which leads to throughput gaps. At this point we still not sure what was happening so we checked the logs of
- RDS response times: PGLogs.log
- mod-oai-pmh
- nginx-oai-pmh
- edge-oai-pmh
- okapi
At each point we have good response times and we can't see correlation between logs and this chart.
Service CPU usage has reached ±200% while data transfer. And it's on 50-60% level during data harvesting. However during "unstable part" of test it has drop down to 20-25%.
- Service memory usage is stable. There is no suspects for memory leak.