Overview

JMeter script was used with the 40 requests per minute throughput limit that helped to obtain full harvesting successful results. The throughput limit usage, honestly, isn't a simulation of real user behaviour. The purpose of the current OAI-PMH tests is to measure the performance of the Orchid release by the EBSCO Harvester recommended tool and to find possible issues, and bottlenecks per PERF-510 - Getting issue details... STATUS

Summary

Tests were executed by EBSCO Harvester in the AWS ptf-windows instance. There is no control of the 40 requests per minute throughput limit as it was done by the Jmeter script. That is why a successful harvesting operation by EBSCO Harvester took less time (~18 hours) than by JMeter script with a throughput limit (~25 hours). Moreover, EBSCO Harvester has a retry functionality that helps to keep on harvesting even after failed time-out requests.
When the mod-oai-pmh.instance table gets accumulated with instance UUIDs from previous harvests, as was when the PTF env mod-oai-pmh.instance table reached 30M records, it took more time to insert new records into the table, therefore the overall duration of creating (downloading) new 'instanceUUIDs' records was increased as well. (Not a critical issue).
'(504) A gateway timeout' were caused by mod-oai-pmh 'java.lang.OutOfMemoryError: Java heap space'. On the other hand, the 'java.lang.OutOfMemoryError' exceptions appeared when not all 'instanceUUIDs ' records were created for the request. The reason why it happens should be investigated additionally.
'repository.fetchingChunkSize=8000' option increased the duration of the harvesting request. The default value (5000) shows optimal results.
All test executions show similar Service Memory behaviour. Only after restarting the service, the service memory usage is in an optimal level. After the harvesting operation starts, Service memory usage is grown up to 99%, stabilized on this level, and doesn't change reached value (even for future harvesting processes).
On the one hand, investigating how much memory is used per harvesting is unavailable. On the other hand, the reason why Service memory usage is not decreased if there are no activities should be investigated additionally as it could be a functionality of AWS displaying aggregated memory of the containers or could be a FOLIO issue.
In case all 'instanceUUIDs ' records have been created for the request as expected (ocp2 - 10'023'100 'instance' records) with different 'repository.fetchingChunkSize' (5000, 8000), the harvesting operation was less than 24 hours and was completed successfully. But the instability of the harvesting operation for Orchid release due to not all 'instance' records being created for the request should be investigated and fixed by MODOAIPMH-509 - Getting issue details... STATUS

Recommendations & Jiras

MODOAIPMH-509 - Getting issue details... STATUS

Test Runs & Results

The table contains information about test executions, their results, 'mod-oai-pmh' service configuration:

Date

Start

Finish

Source

Records/Req

Metadata
Final Response

DB_Instances
Created

Result

Comment

Version

CPU

memory/
memoryReservation

Xmx

MaxMetaSpaceSize

MetaspaceSize

Tasks

Task
Revision

1

4/4/2023

4/4/2023 14:15

SRS+Inv

100

{
    "requestMetadataCollection": [
        {
            "requestId": "df1ef27f-fcc5-4e51-a7b8-0577ced08380",
            "lastUpdatedDate": "2023-04-04T14:56:26.003+00:00",
            "streamEnded": false,
            "downloadedAndSavedInstancesCounter": 0,
            "failedToSaveInstancesCounter": 0,
            "returnedInstancesCounter": 1741881,
            "skippedInstancesCounter": 33019,
            "failedInstancesCounter": 0,
            "suppressedInstancesCounter": 0
        }
    ],
    "totalRecords": 1
}

x

(504) Gateway timeout was caused by mod-oai-pmh java.lang.OutOfMemoryError: Java heap space

mod-oai-pmh:3.11.3

2048

2048/1845

1440m

512m

384m

2

12

2

4/5/2023

5/4/2023 11:13

SRS+Inv

100

x

x

Not enough space. Local problem

mod-oai-pmh:3.11.3

3072

3072/2872

1440m

512m

384m

2

13

3

4/5/2023

5/4/2023 12:58

SRS+Inv

100

x

x

Not enough space. Local problem

mod-oai-pmh:3.11.3

3072

3072/2872

1440m

512m

384m

2

13

4

4/6/2023

6/4/2023 20:03

7/4/2023 14:15

SRS+Inv

100

{
    "requestMetadataCollection": [
        {
            "requestId": "d6dbf068-9a12-4d8b-915c-c340115dc2e0",
            "lastUpdatedDate": "2023-04-07T11:57:25.317+00:00",
            "streamEnded": true,
            "downloadedAndSavedInstancesCounter": 10023100,
            "failedToSaveInstancesCounter": 0,
            "returnedInstancesCounter": 9951864,
            "skippedInstancesCounter": 71236,
            "failedInstancesCounter": 0,
            "suppressedInstancesCounter": 0
        }
    ],
    "totalRecords": 1
}

Harvest has completed

mod-oai-pmh:3.12.0-S.299

2048

3072/2767

2150m

512m

384m

2

14

5

4/7/2023

7/4/2023 16:30

7/4/2023 17:09

SRS

100

{
    "requestMetadataCollection": [
        {
            "requestId": "27d95a17-cdf9-4d3b-af70-a0792d839d29",
            "lastUpdatedDate": "2023-04-07T17:26:55.603+00:00",
            "streamEnded": false,
            "downloadedAndSavedInstancesCounter": 0,
            "failedToSaveInstancesCounter": 0,
            "returnedInstancesCounter": 2127445,
            "skippedInstancesCounter": 32755,
            "failedInstancesCounter": 0,
            "suppressedInstancesCounter": 0
        }
    ],
    "totalRecords": 1
}

2,345,000

(504) Gateway timeout was caused by mod-oai-pmh java.lang.OutOfMemoryError: Java heap space

mod-oai-pmh:3.12.0-S.299

2048

3072/2767

2150m

512m

384m

2

14

6

4/11/2023

11/4/2023 11:23

11/4/2023 12:50

SRS

100

{
    "requestMetadataCollection": [
        {
            "requestId": "0c82370f-abfb-48cf-9e97-10e9893c8f17",
            "lastUpdatedDate": "2023-04-11T10:50:12.944+00:00",
            "streamEnded": true,
            "downloadedAndSavedInstancesCounter": 10023099,
            "failedToSaveInstancesCounter": 0,
            "returnedInstancesCounter": 452921,
            "skippedInstancesCounter": 79,
            "failedInstancesCounter": 0,
            "suppressedInstancesCounter": 0
        }
    ],
    "totalRecords": 1
}

10,023,099

Assumption: Harvest will be completed

Stopped by Firebird request

repository.fetchingChunkSize=2000, by default was 5000

mod-oai-pmh:3.12.0-S.299

2048

3072/2767

2150m

512m

384m

2

14

7

4/11/2023

11/4/2023 12:55

11/4/2023 14:31

SRS

100

repository.fetchingChunkSize=2000
to check downloading process

mod-oai-pmh:3.12.0-S.299

2048

3072/2767

2150m

512m

384m

2

14

8

4/11/2023

11/4/2023 14:30

11/4/2023 15:47

SRS

100

10,023,100

Assumption: Harvest will be completed

Stopped by Firebird request

repository.fetchingChunkSize=8000

mod-oai-pmh:3.12.0-S.299

2048

3072/2767

2150m

512m

384m

2

14

9

4/11/2023

11/4/2023 16:20

11/4/2023 17:36

SRS

100

10,023,100

Assumption: Harvest will be completed

Stopped by Firebird request

repository.fetchingChunkSize=8000

mod-oai-pmh:3.12.0-S.299

2048

3072/2767

2150m

512m

384m

2

14

10

4/11/2023

11/4/2023 18:03

12/4/2023 10:43

SRS

100

{
            "requestId": "a68b7d56-d8da-472a-a8aa-0db4a0392916",
            "lastUpdatedDate": "2023-04-12T08:43:25.844+00:00",
            "streamEnded": true,
            "downloadedAndSavedInstancesCounter": 10023100,
            "failedToSaveInstancesCounter": 0,
            "returnedInstancesCounter": 9691838,
            "skippedInstancesCounter": 70367,
            "failedInstancesCounter": 0,
            "suppressedInstancesCounter": 0
        }

10,023,100

Harvest has completed

repository.fetchingChunkSize=8000

mod-oai-pmh:3.12.0-S.299

2048

3072/2767

2150m

512m

384m

2

14

11

4/12/2023

12/4/2023 12:55

12/4/2023 13:41

SRS+Inv

100

(504) Gateway timeout was caused by mod-oai-pmh java.lang.OutOfMemoryError: Java heap space

repository.fetchingChunkSize=8000

mod-oai-pmh:3.12.0-S.299

2048

3072/2767

2150m

512m

384m

2

14

12

4/12/2023

12/4/2023 14:15

12/4/2023 14:58

SRS+Inv

100

Assumption: Harvest will be completed

Stopped by Firebird request

repository.fetchingChunkSize=8000

mod-oai-pmh:3.12.0-S.299

2048

4096/3690

3076m

512m

384m

2

15

13

4/12/2023

12/4/2023 15:33

13/4/2023 09:14

SRS+Inv

100

10,023,100

Harvest has completed

repository.fetchingChunkSize=5000 (default)
truncate before test

mod-oai-pmh:3.12.0-S.299

2048

4096/3690

3076m

512m

384m

2

15

Comparisons

Comparison with full harvesting testing by JMeter script OAI-PMH data harvesting (Orchid)

Source

maxRecordsPerResponse

Duration - Orchid

JMeter script

Duration - Orchid

EBSCO Harvester

SRS

100

~25 h

(approx. all 'instancesUUIDs ' were created)

~ 17 h

(all 'instancesUUIDs ' were created)

SRS+Inventory

100

~ 9 h

(not all 'instancesUUIDs ' and no issues in console)

~ 18 h

(all 'instancesUUIDs ' were created)

The integrity of the harvesting was not checked

Memory Utilization

Successful test (#13)

'mod-oai-pmh' uses ~100-102% of Memory during harvesting (assumption: Memory was reserved by previous test up to 99%)

Failed test (#11)

'mod-oai-pmh' uses ~99% of Memory during harvesting (assumption: Memory was reserved by previous test up to 99%)

CPU Utilization

Successful test (#13)

'mod-graphql' had a quite big CPU usage spike up to 1653 units during harvesting (one-time incident).

Failed test (#11)

'mod-oai-pmh' uses up to ~97% of CPU during harvesting.

The spike in CPU utilization is probably due to the activity toward the end of the test when it was running out of memory and the Operating System was trying to clear whatever memory it could and/or swapping out blocks of memory to keep the process alive - until it cannot do it anymore and everything toppled

RDS CPU Utilization

Successful test (#13)

The highest spike was up to 76% of RDS CPU usage for ocp2-db-01

Failed test (#11)

Additional information from module and database logs

2023-04-12T13:41:07.262+02:00 % Total % Received % Xferd Average Speed Time Time Time Current

2023-04-12T13:41:07.262+02:00 Dload Upload Total Spent Left Speed

2023-04-12T13:41:11.118+02:00 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 1 544M 0 0 1 9024k 0 77.9M 0:00:06 --:--:-- 0:00:06 77.3M 37 544M 0 0 37 203M 0 183M 0:00:02 0:00:01 0:00:01 183M 61 544M 0 0 61 336M 0 159M 0:00:03 0:00:02 0:00:01 159M 84 544M 0 0 84 461M 0 148M 0:00:03 0:00:03 --:--:-- 148M 100 544M 0 0 100 544M 0 141M 0:00:03 0:00:03 --:--:-- 141M

2023-04-12T13:41:14.898+02:00 11:41:14 [] [] [] [mod_oai_pmh] INFO RestRouting invoking getAdminHealth

2023-04-12T13:41:14.898+02:00 11:41:14 [] [] [] [mod_oai_pmh] INFO LogUtil 127.0.0.1:50906 GET /admin/health null HTTP_1_1 200 4 0 tid=null OK

2023-04-12T13:41:14.898+02:00 11:41:14 [] [] [] [mod_oai_pmh] INFO RestRouting invoking getAdminHealth

2023-04-12T13:41:14.898+02:00 11:41:14 [] [] [] [mod_oai_pmh] INFO LogUtil 127.0.0.1:47758 GET /admin/health null HTTP_1_1 200 4 0 tid=null OK

2023-04-12T13:41:16.937+02:00 11:41:14 [] [] [] [] ERROR ? Unhandled exception

2023-04-12T13:41:16.937+02:00 java.lang.OutOfMemoryError: Java heap space

2023-04-12T13:41:16.937+02:00 at java.util.Arrays.copyOf(Arrays.java:3745) ~[?:?]

2023-04-12T13:41:16.937+02:00 at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:172) ~[?:?]

2023-04-12T13:41:16.937+02:00 at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:686) ~[?:?]

2023-04-12T13:41:16.937+02:00 at java.lang.StringBuffer.append(StringBuffer.java:414) ~[?:?]

2023-04-12T13:41:16.937+02:00 at java.io.StringWriter.write(StringWriter.java:99) ~[?:?]

2023-04-12T13:41:16.937+02:00 at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:120) ~[?:?]

2023-04-12T13:41:16.937+02:00 at java.io.BufferedWriter.flush(BufferedWriter.java:256) ~[?:?]

The following screenshot is an example of the memory profiling results:

Appendix

Methodology/Approach

Before running OAI-PMH for the first time, please run the following database commands to optimize the tables (from https://folio-org.atlassian.net/wiki/display/FOLIOtips/OAI-PMH+Best+Practices#OAIPMHBestPractices-SlowPerformance):

REINDEX index <tenant>_mod_inventory_storage.audit_item_pmh_createddate_idx ;
REINDEX index <tenant>_mod_inventory_storage.audit_holdings_record_pmh_createddate_idx;
REINDEX index <tenant>_mod_inventory_storage.holdings_record_pmh_metadata_updateddate_idx;
REINDEX index <tenant>_mod_inventory_storage.item_pmh_metadata_updateddate_idx;
REINDEX index <tenant>_mod_inventory_storage.instance_pmh_metadata_updateddate_idx;
analyze verbose <tenant>_mod_inventory_storage.instance;
analyze verbose <tenant>_mod_inventory_storage.item;
analyze verbose <tenant>_mod_inventory_storage.holdings_record;

Execute the following query in a related database for removing existed 'instances' created by previous harvesting request and a request itself:

TRUNCATE TABLE fs09000000_mod_oai_pmh.request_metadata_lb cascade

2. Get access to the ptf-windows machine for running EBSCO Harvesting (contact person Artem Vronik ). The following cmd command (cmd should be run in the same directory as EBSCO Harvester) start EBSCO Harvester:

OAIPMHHarvester.exe -HarvestMode=full -DefinitionId=Orchid -HarvesterWebClientTimeout_Seconds=0s=0

3. Monitor service hardware metrics (suggest using MAX aggregation for Service CPU and Memory usage) and logs during a test execution

Browser not supported