OAI-PMH data harvesting[Concurrent Incremental] (Poppy)
Overview
Summary
OAI-PMH - Incremental Harvesting:
Three tests have been executed by JMeter script to check performance of harvesting the following number of records 10K, 25K, 50K, 500K and 1 MLN with different OAI-PMH Behaviors :
Test 1. Record source set to Source record storage ;
Test 2. Record source set to Inventory* (data set limit in OCP3 - 250k) ;
Test 3. Record source set to Source record storage and inventory.
- Number of multiple concurrent harvests:
- 2 harvests;
- 4 harvests;
- 6 harvests.
- CPU utilization during all tests was relevant to number of concurrent harvests.
- Test #1 mod-oai-pmh-b: 2 harvests - 5%, 4 harvests - 10%, 6 harvests - 15%
- Test #2 mod-oai-pmh-b: 2 harvests - 1%, 4 harvests - 3.7%, 6 harvests - 5.5%
- Test #3 mod-oai-pmh-b: 2 harvests - 10%, 4 harvests - 15%, 6 harvests - 25%
- Memory consumption was stable except of mod-inventory which grew slowly and mod-oai-pmh that grew up from 46% to 56%. Tests:
- Tests #1 and #3 mod-oai-pmh-b didn't exceed 40%
- Test #2 mod-oai-pmh-b achieved 55%
- RDS CPU utilization:
- The averages CPU usage for 2 harvests - 15%
- The averages CPU usage for 4 harvests - 20%
- The averages CPU usage for 6 harvests - 25%
- Durations of harvests differed significantly in tests #1,3 (SRS) and test #2 (Inventory) because of the date creation distribution fromDate and untilDate parameters.
- Durations were not degraded by increased number of concurrent harvests.
- Response times for tests can be found in expanded links in section Test #. Record source
Improvements that can be noted in Poppy release:
1) Non-ECS environment with Poppy release can handle concurrent OAI-PMH
Recommendations & Jiras
- To prepare tests it's good point to populate complete_updated_date column in {tenant}_mod_inventory_storage.instance using migration. More info in Appendix section.
- To avoid degradation on OAI-PMH response times check that DB top queries do not have DELETE and INSERT for marc_id values after cluster restart
- To have the same starting conditions before running test with different Record source sets the edge-oai-pmh service was restarted, it was done to return the service memory usage to its starting(after deployment) value;
Test Runs & Results
Incremental harvesting
2 concurrent Incremental OAI-PMH | 4 concurrent Incremental OAI-PMH | 6 concurrent Incremental OAI-PMH | |||||||
Number of harvested records | Test 1. Record source = Source record storage Duration | Test 2. Record source = Inventory Duration | Test 3. Record source = Source record storage and inventory Duration | Test 1. Record source = Source record storage Duration | Test 2. Record source = Inventory Duration | Test 3. Record source = Source record storage and inventory Duration | Test 1. Record source = Source record storage Duration | Test 2. Record source = Inventory Duration | Test 3. Record source = Source record storage and inventory Duration |
---|---|---|---|---|---|---|---|---|---|
10000 records(10K) | 00:02:08 | 00:08:55 | 00:01:39 | 00:01:05 | 00:01:46 | 00:01:31 | 00:01:07 | 00:01:32 | 00:01:14 |
25000 records(25K) | 00:04:09 | 00:16:25 | 00:04:27 | 00:02:38 | 00:21:00 | 00:04:34 | 00:02:52 | 00:20:32 | 00:02:57 |
50000 records(50K) | 00:07:40 | 00:33:25 | 00:08:10 | 00:05:17 | 00:32:46 | 00:07:44 | 00:05:34 | 00:32:47 | 00:13:25 |
500000 records(500K) / 250000 records(250K) in test #2 | 01:56:40 | 02:33:30 | 01:51:24 | 01:58:34 | 02:35:29 | 01:48:48 | 01:34:29 | 02:37:45 | 01:44:42 |
1000000 records(1MLN) | 02:50:17 | not enough data | 02:39:09 | 02:59:09 | not enough data | 02:50:29 | 03:04:30 | not enough data | 02:58:50 |
Incremental harvesting
Test 1. Record source = Source record storage
Service CPU Utilization
During five harvesting tests with 10K, 25k, 50K, 500K and 1