OAI-PMH data harvesting (Nolana)
Overview
The purpose of the OAI-PMH tests is to measure performance of Nolana release and to find possible issues, bottlenecks per PERF-332
Environment
Version | CPU | Memory | Xmx | MaxMetaSpaceSize | Tasks Count | Task Rev Number | |
---|---|---|---|---|---|---|---|
mod-oai-pmh | 3.10.0 | 1024 (Morning Glory: 2048) | 2000 | 2248 (Morning Glory: 1845 | 2048) | 1440 | 512 | 2 | 1 | 2 (R/W split) |
edge-oai-pmh | 2.5.1 | 1024 | 1360 | 1512 | 1440 (MG: 952) | 512 | 2 | 1 |
mod-inventory-storage | 25.0.1 | 1024 | 1952| 2208 (Morning Glory: 1684 | 1872) | 1440 | 512 | 2 | 1 | 2 (R/W split) |
mod-source-record-storage | 5.5.2 | 1024 | 1440 | 1536 | 908 | 512 | 2 | 1 | 2 (R/W split) |
okapi | 4.14.7 | 1024 | 1440 | 1684 | 922 | 512 | 3 | 1 |
Summary
- Average response time per request with resumption token 646ms ( compared to Morning Glory's 600ms).
- Incremental calls performed - 81852 (Bugfest data set 1 user and 35 DB connections)*.
- Thread block errors and subsequent OOMs happened about 50% of the time. This is likely due to a fast rate of incremental calls by the JMeter test script. We did not test with 40 requests/min rate (which in Morning Glory proved to be successful 100% of the time). Since this version of mod-oai-pmh does not have any changes except for upgrading to RMB v35.0.0, the OOMs seen in Morning Glory are also seen here.
- Tested with both DB R/W split enabled and disabled. Harvest durations are about the same and the failure rate (due to OOMs) is about the same when DB R/W split is enabled or disabled.
* Note: Bugfest dataset was used because it has more SRS records than PTF's dataset.
Test Results
Test Number | Duration | R/W Split Enabled | Total Requests | Requests/sec | Average | Completed |
---|---|---|---|---|---|---|
1 | 14h 05m | N | 81351 | 1.54 | 0.621 | Y |
2 | 14h 40m | N | 81351 | 1.6 | 0.643 | Y |
3 | 15h 20m | N | 81852 | 1.504 | 0.662 | Y |
4 | 14h 17m | N | 81852 | 1.454 | 0.683 | Y |
5 | 28m | N | 2125 | 1.406 | 0.772 | N |
6 | 14h 4m | N | 81852 | 1.593 | 0.625 | Y |
7 | 29m | N | 2263 | 1.49 | 0.735 | N |
8 | 37m | Y | 2094 | 1.36 | 0.921 | N |
9 | 14h 40m | Y | 81852 | 1.611 | 0.619 | Y |
10 | 30m | Y | 2205 | 1.568 | 0.689 | N |
11 | 14h | Y | 81852 | 1.602 | 0.622 | Y |
12 | 15h | Y | 81852 | 1.469 | 0.677 | Y |
13 | 37m | Y | 2113 | 1.339 | 0.884 | N |
Successful Tests
Successful Harvest: Service CPU Utilization
The following is a typical set of graphs from one of the successful harvests (without enabling DB R/W split).
mod-oai-pmh can spike up to over 300% initially. The other modules barely incur any high CPU utilization above their baselines. mod-inventory-storage spiked for a few minutes due to the initial instances transfer.
Successful Harvest: Service Memory Usage
No issues seen with memory on a good run.
Database CPU Utilization
The DB CPU utilization shows spikes in the first 30-60 minutes for transferring instances and relatively even and smooth for the rest of the harvest. Surprisingly some spikes on the DB read node also showed up because this is a test that does not have DB R/W split enabled.
Failed Tests
The following are graphs of a failed harvest. These failed harvests happened within the first 60 minutes.
Failed Harvest: Service CPU Utilization
Unlike a typical harvest, there isn't an initial mod-oai-pmh spike right off the bat, but there is a long drag out for 15 minutes and then a huge spike due to memory running low before one of the two mod-oai-pmh container crashed.
Failed Harvest: Service Memory Usage
In a failed harvest mod-oai-pmh's memory usage started out fine but spiked to over a 100% then crashed.
Failed Harvest: Database CPU Utilization
RDS CPU Utilization in a failed harvest does not show any unusual pattern compared to a successful harvest.
DB R/W Split Enabled
With DB R/W Split enabled, all the graphs for successful and failed harvest cases are the same as when DB R/W split was not enabled.
Failed Harvest (With R/W Split enabled)
\