OAI-PMH data harvesting[Concurrent Incremental] (Poppy)

Overview

  • The purpose of the OAI-PMH Concurrent Incremental Harvesting tests is to measure performance of Poppy release and to find possible issues, bottlenecks PERF-786 - Getting issue details... STATUS on OCP3 environment.
  • The previous results of Incremental OAI-PMH  PERF-660 - Getting issue details... STATUS

Summary

  • OAI-PMH - Incremental Harvesting:

    • Three tests have been executed by JMeter script to check performance of harvesting the following number of records 10K, 25K, 50K, 500K and 1 MLN with different OAI-PMH Behaviors :

      • Test 1. Record source set to Source record storage ;

      • Test 2. Record source set to Inventory* (data set limit in OCP3 - 250k) ;

      • Test 3.  Record source set to Source record storage and inventory.

    • Number of multiple concurrent harvests:
      • 2 harvests;
      • 4 harvests;
      • 6 harvests.
  • CPU utilization during all tests was relevant to number of concurrent harvests. 
    • Test #1 mod-oai-pmh-b: 2 harvests -   5%, 4 harvests - 10%,  6 harvests - 15%
    • Test #2 mod-oai-pmh-b: 2 harvests -   1%, 4 harvests - 3.7%, 6 harvests - 5.5%
    • Test #3 mod-oai-pmh-b: 2 harvests - 10%, 4 harvests - 15%,  6 harvests - 25%
  • Memory consumption was stable except of mod-inventory which grew slowly and mod-oai-pmh that grew up from 46% to 56%.  Tests:
    • Tests #1 and #3 mod-oai-pmh-b didn't exceed 40%
    • Test #2 mod-oai-pmh-b achieved 55%
  • RDS CPU utilization:
    • The averages CPU usage for  2 harvests - 15%
    • The averages CPU usage for  4 harvests - 20%
    • The averages CPU usage for  6 harvests - 25%
  • Durations of harvests differed significantly in tests #1,3 (SRS) and test #2 (Inventory) because of the date creation distribution fromDate and untilDate parameters.
  • Durations were not degraded by increased number of concurrent harvests.
  • Response times for tests can be found in expanded links in section Test #.  Record source

Improvements that can be noted in Poppy release:
1) Non-ECS environment with Poppy release can handle concurrent OAI-PMH 

Recommendations & Jiras

  • To prepare tests it's good point to populate complete_updated_date column in {tenant}_mod_inventory_storage.instance using migration. More info in Appendix section.
  • To avoid degradation on OAI-PMH response times check that DB top queries do not have DELETE and INSERT for marc_id values after cluster restart
  • To have the same starting conditions before running test with different Record source sets the edge-oai-pmh service was restarted, it was done to return the service memory usage to its starting(after deployment) value;

Test Runs & Results

Incremental harvesting


2 concurrent Incremental OAI-PMH4 concurrent Incremental OAI-PMH6 concurrent Incremental OAI-PMH

Number of harvested records

Test 1. Record source = Source record storage DurationTest 2. Record source = Inventory DurationTest 3. Record source = Source record storage and inventory Duration

Test 1. Record source = Source record storage Duration

Test 2. Record source = Inventory Duration

Test 3. Record source = Source record storage and inventory DurationTest 1. Record source = Source record storage DurationTest 2. Record source = Inventory DurationTest 3. Record source = Source record storage and inventory Duration

10000 records(10K)

00:02:0800:08:5500:01:3900:01:05

00:01:46

00:01:3100:01:0700:01:3200:01:14

25000 records(25K)

00:04:0900:16:2500:04:2700:02:3800:21:0000:04:3400:02:5200:20:3200:02:57

50000 records(50K)

00:07:4000:33:2500:08:1000:05:1700:32:4600:07:4400:05:3400:32:4700:13:25

500000 records(500K) / 250000 records(250K) in test #2

01:56:4002:33:3001:51:2401:58:3402:35:2901:48:4801:34:2902:37:4501:44:42

1000000 records(1MLN)

02:50:17not enough data02:39:0902:59:09not enough data02:50:2903:04:30not enough data02:58:50

Incremental harvesting

Test 1.  Record source = Source record storage

 Results for Test 1.  Record source = Source record storage
Test LabelNumber of harvested recordsAverage Response Times, msDuration
SRS 2 concurrent 10k100000.98200:02:08
SRS 4 concurrent 10k100000.35600:01:05
SRS 6 concurrent 10k100000.3700:01:07
SRS 2 concurrent 25k250000.68900:04:09
SRS 4 concurrent 25k250000.33100:02:38
SRS 6 concurrent 25k250000.38500:02:52
SRS 2 concurrent 50k500000.61600:07:40
SRS 4 concurrent 50k500000.33400:05:17
SRS 6 concurrent 50k500000.36400:05:34
SRS 2 concurrent 500k5000000.90301:56:40
SRS 4 concurrent 500k5000001.1201:58:34
SRS 6 concurrent 500k5000000.82901:34:29
SRS 2 concurrent 1Mln10000000.71802:50:17
SRS 4 concurrent 1Mln10000000.7702:59:09
SRS 6 concurrent 1Mln10000000.80203:04:30

This graph shows response times for GET request that retrieve data. For some reason for 4 and 6 concurrent harvests with 10k, 25k and 50k it decreases significantly affecting positively duration.

Service CPU Utilization

During five harvesting tests with 10K, 25k, 50K, 500K and 1