OAI-PMH data harvesting (Orchid)
- 1 Overview
- 2 Environment
- 3 Summary
- 4 Test Results
- 5 Incremental harvesting
- 6 Appendix
Overview
The purpose of the OAI-PMH tests is to measure the performance of the Orchid release and to find possible issues, bottlenecks per PERF-410: [Orchid] [OAI-PMH] Performance testing (Jmeter)Closed
Environment
| Version | CPU | Memory/MemoryReservation | Xmx | MaxMetaSpaceSize | MetaspaceSize | Tasks Count | Task Rev Number |
|---|---|---|---|---|---|---|---|---|
mod-oai-pmh | mod-oai-pmh:3.11.3 | 2048 | 2048/1845 | 1440m | 512m | 384m | 2 | 12 |
edge-oai-pmh | edge-oai-pmh:2.6.1 | 1024 | 1512/1360 | 1440m | 512m | 384m | 2 | 5 |
mod-inventory-storage | mod-inventory-storage:26.0.0 | 1024 | 2208/1952 | 1440m | 512m | 384m | 2 | 4 |
mod-source-record-storage | mod-source-record-storage:5.6.5 | 2048 | 5600/5000
| 3600m | 512M | 384M | 2 | 4 |
okapi | okapi:5.1.0-SNAPSHOT.1352 okapi:5.0.1 | 1024 | 1684/1440 | 922m | 512m | 384m | 3 | 2 |
Summary
Tests were executed by JMeter script in the AWS Carrier-io instance.
The average response time per request with a resumption token is 604 ms ( compared to Nolana's 646 ms).
'OutOfMemory' exceptions (OOM) were detected at all times when tests were run without throughput limits (w/o 40 requests/min). Tests with 40 requests/min (which in Morning Glory proved to be successful 100% of the time) had positive results but the integrity of the harvesting was not checked. OOMs seen in Morning Glory and Nolana releases are also seen here.
'Thread blocked' exceptions were detected only during the investigation of a custom branch (https://github.com/folio-org/mod-oai-pmh/tree/vert-timeout-rollback) with vert timeout manipulation.
Recommendations & Jiras
for providing additional confirmation, execute OAI-PMH full harvesting by EBSCO Harvester from AWS ec2 instance for avoiding location factors
PERF-510: [Orchid] [OAI-PMH] Performance testing EBSCO HarvesterClosed;
PERF-449: Test OAI-PMH thread blocked issueClosed potentially, might be closed and re-open in case of similar issues will be detected by EBSCO Harvester testing;
create a ticket for the Firebird Team for resolving 'mod-oai-pmh' OOM issues and performance optimization of harvesting functionality.
Test Results
Test Number | recordsSource | maxRecordsPerResponse | Duration | Total Requests | Requests/min | Average Response time, ms | Completed |
|---|---|---|---|---|---|---|---|
1 | Source record storage | 100 | ~25 h | 92'040 | 39.5 | 604 | No issues |
2 | Source record storage and Inventory | 100 | ~9 h | 19'889 | 38.8 | 488 | No issues |
- the integrity of the harvesting was not checked
Response metadata:
Test #1
{
"requestMetadataCollection": [
{
"requestId": "5457c24a-d3c6-4807-8b9e-9a0123802820",
"lastUpdatedDate": "2023-04-03T06:12:59.807+00:00",
"streamEnded": true,
"downloadedAndSavedInstancesCounter": 10023100,
"failedToSaveInstancesCounter": 0,
"returnedInstancesCounter": 9160018,
"skippedInstancesCounter": 45682,
"failedInstancesCounter": 0,
"suppressedInstancesCounter": 0
}
],
"totalRecords": 1
}
Test #2
{
"requestMetadataCollection": [
{
"requestId": "0c2ef1f4-ab94-450a-b092-723ffc5338d6",
"lastUpdatedDate": "2023-04-03T18:39:29.738+00:00",
"streamEnded": true,
"downloadedAndSavedInstancesCounter": 10023100,
"failedToSaveInstancesCounter": 0,
"returnedInstancesCounter": 1957773,
"skippedInstancesCounter": 33027,
"failedInstancesCounter": 0,
"suppressedInstancesCounter": 0
}
],
"totalRecords": 1
}For additional investigation of the behaviour of OAI-PMH operations with the throughput limitation of 40 requests/min, a series of test executions have been run on the local machine:
Id | recordsSource | Throughput | Exec | Result | Comments |
1 | Source records storage | No limits | local | OutOfMemory |
|
2 | 40/min | local | connection lost ~12h | Not representative due to the local machine has lost connection | |
3 | Inventory | No limits | local | 504 Gateway Time-out |
|
5 | Source record storage and Inventory | No limits | local | java.net.SocketException: An established connection was aborted by the software in your host machine | Not representative due to the local machine has lost connection |
Moreover, for additional investigation of the behaviour of OAI-PMH operations, several versions of 'mod-oai-pmh' were tested:
Version of the mod-oai-pmh | Throughput | recordsSource | Result |
mod-oai-pmh:3.11.0 | No limits | SRS | OutOfMemory |
mod-oai-pmh:3.11.1 | No limits | SRS | OutOfMemory |
mod-oai-pmh:3.12.0 | No limits | SRS | OutOfMemory |
https://github.com/folio-org/mod-oai-pmh/tree/vert-timeout-rollback | No limits | SRS | Thread blocked >> OOM |
Resources Utilization: Test #1 - SRS > 92'040 requests
Service CPU Utilization
mod-oai-pmh can spike up to over 300% initially. The other modules barely incur any high CPU utilization above their baselines.
Service Memory Usage
No memory issues were detected.
Database CPU Utilization
The DB CPU utilization shows spikes in the first 30-60 minutes for transferring instances and is relatively even and smooth for the rest of the harvest.
ocp2-db-01: in the case of excepting the first 30-60 minutes, we can see a tendency of slow CPU Utilization growth.
Resources Utilization: Test #2 - SRS+Inventory > 19'889 requests
Service CPU Utilization
'mod-oai-pmh' spiked up to less than 80% initially. The other modules barely incur any high CPU utilization above their baselines.
Service Memory Usage
No memory issues were detected.
Database CPU Utilization
The DB CPU utilization shows spikes in the first 30-60 minutes for transferring instances and is relatively even and smooth for the rest of the harvest.
Incremental harvesting
Environment
Incremental harvesting testing were performed on a bit different environment (for most of a modules it's the same except mod-source-record-storage memory configuration for incremental harvesting it's less than for full harvesting env.5600/5000 vs 2048/1440 and xmx 3600 vs 1024)
For env configuration details please see : Appendix section
Summary
For all tests all data were harvested successfully;
There are a chance to get "504 Gateway Time-out" on initial call. Bug created MODINVSTOR-1052
Response times for harvesting calls is in average 650 -750ms;
No memory leak suspects were defined;
RDS CPU usage is on 12-13% level during harvesting and ±35-40% during data transferring.
Methodology
For testing incremental harvesting PTF team create JMeter script that contains initial oai-pmh call:
/oai?verb=ListRecords&metadataPrefix=marc21_withholdings&apikey=<apiKey>&from=<start-date>&until=<end-date>
and subsequent call:
oai/<APIKey>?verb=ListRecords&resumptionToken=<resumptionToken>
Time ranges and data set sizes:
set | from | to | num of records |
|---|---|---|---|
1 | 2023-04-07 | 2023-04-07 | 25K |
2 | 2023-04-01 | 2023-04-06 | 590K |
3 | 2023-03-28T18:00:00Z | 2023-04-06T18:00:00Z | 1M records |
Test Results
test # | Data set | duration | Record source | status |
|---|---|---|---|---|
1 | 25K | 3 min 50s | SRS | Success |
2 | 590K | 1 hr 14min | SRS | Success |
3 | 1M | 2 hr 1min | SRS | Success |
4 | 25K | 4 min 32 s | SRS+inventory | Success |
5 | 590K | 1 hr 7min | SRS+inventory | Success |
6 | 1M | 2 hr 21 min | SRS+inventory | Success |
Record source SRS resource usage
Record source SRS+inventory resource usage
Service CPU usage - 25K- 500K instances harvesting
Service CPU usage - 1M instances harvesting
Service Memory usage - 25K- 500K instances harvesting
Service Memory usage - 1M instances harvesting
RDS CPU usage 25K- 500K instances harvesting
RDS CPU usage 1M instances harvesting
ERRORS
Below log from mod-oai-pmh containing 504 Gateway Time-out