Table of Contents |
---|
Overview
- The purpose of the concurrent OAI-PMH, data import and CI/CO tests is to determine how these workflows affect each other. The report contain results for the
Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-492
Summary
During test executions start it was observes growth of Service Memory Usage for all services. It's connected to the cluster daily start.Table of Contents |
---|
Overview
- The purpose of the concurrent OAI-PMH, data import and CI/CO tests is to determine how these workflows affect each other. The report contain results for the
Jira Legacy server System JIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-492
Summary
- The OAI-PMH has an influence on CI/CO response times - it worsen results up to 13%. DI worsens CI/CO results mostly with create job profile up to 55% with 1.000 records and up to 71% with 100.000 records. DI with update profile worsens CI/CO results less than create profile - up to 25% with 1.000 records and 36% with 100.000 records. OAI-PMH (incremental) duration was in range from 04:20 min till 05:20 min for all tests of DI with 1.000 records and without it. OAI-PMH duration calculation is described in Methodology/Approach section.
- For major services memory usage didn't exceed the level of 60%. The highest level was registered for mod-source-record-manager 107% and mod-inventory-b 98%. After tests for Scenario 1 it achieved its stable level and didn't change.
- Running OAI-PMH, DI and CI/CO simultaneously it has been shown that the environment can handle such load.
- CI/CO response times during DI and OAI-PMH has degradation depending on the profile that was used. It was a number of consequent DI operations (create and update job profiles).
- After 90 minutes of full harvest the growth of CPU utilization up to 188 % was observed for mod-oai-pmh-bup to 188 % was observed . This increased CPU utilization lasted during 10 minutes with getting . After it got back to steady state ( 5-7 % ).
- Service CPU Utilization at the beginning of DI mostly used by mod-di-converter-storage-b ( 253 % ), mod-inventory-b ( 172 % ), mod-quick-marc-b ( 108 % ). For the rest of modules it was under 70%. At the highest level it was mod-di-converter-storage-b ( 453 % ), mod-inventory-b ( 190 % ), mod-quick-marc-b ( 121 % ).
- RDS CPU Utilization during incremental harvesting didn't exceed 60 % for all DI job profiles (1.000 records). Data export took 40% But for full harvesting with DI Create job profile (100.000 records) it became instantly 96 % and stayed on this level major part of process. DI Update used up to 90%.
- All oai-pmh tests were executed by EBSCO Harvester in the AWS ptf-windows instance.
- During full harvesting (504) Gateway Timeout issue happened after all DI create and update were done so it didn't affect the results. It happened during all two Full harvesting runs with returned instances count ( during first 5 hours CI/CO - 1764989 records, other - 1166089 out of total 10433728 ).
Recommendations & Jiras
- Allocate more CPU resources to mod-di-convertor-storage and mod-inventory-b
- During testing observed unhealthy behaviour from mod-remote-storage-b service side (reason Health checks failed with these codes: [404]).
The same unhealthy behaviour was from mod-licenses-b and mod-service-interaction-b (reason Health checks failed with these codes: [502])), mod-quick-marc-b ( 121 % ).Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-618 - RDS CPU Utilization during incremental harvesting didn't exceed 60 % for all DI job profiles (1.000 records). Data export took 40% But for full harvesting with DI Create job profile (100.000 records) it became instantly 96 % and stayed on this level major part of process. DI Update used up to 90%.
- All oai-pmh tests were executed by EBSCO Harvester in the AWS ptf-windows instance.
- During full harvesting (504) Gateway Timeout issue happened after all DI create and update were done so it didn't affect the results. It happened during all two Full harvesting runs with returned instances count ( during first OAI-PMH full - 1764989 records, second - 1166089 out of total 10433728 ).
Recommendations & Jiras
- Allocate more CPU resources to mod-di-convertor-storage and mod-inventory-b
Test Runs & Results
Data import duration and CI/CO response times with DI & OAI-PMH results
Test # | CI/CO 10 users | Scenario | Job profile | OAI-PMH only / instance amount | OAI-PMH + DI + CI/CO Duration | DI Duration | CI average | CO average | Load level | Comments |
Scenario 1 OAI-PMH incremental | 40 min | DI MARC Bib Create | PTF - Create 2 | 00:04:46 8000 | 00:05:18 | 00:00:48 | 0.961 | 1.398 | For scenario 1 1K (with pause ~5 min) | All incremental harvests were stopped manually after ~ 8000 instances |
DI MARC Bib Update | PTF - Updates |
Success - 1 | 00:05:14 8000 | 00:00:56 | 0.706 | 1.125 | |||||
DI MARC Bib Create | PTF - Create 2 | 00:05:11 8000 | 00:04:20 | 00:00:43 | 0.843 | 1.402 | |||
DI MARC Bib Update | PTF - Updates Success - 1 | 00:04:24 8000 | 00:00:44 | 0.848 | 1.335 | ||||
Scenario 2 OAI-PMH full mode | 5 hours | DI MARC Bib Create | PTF - Create 2 | 1764989 | 04:42:20 | 00:53:30 | 1.078 | 1.545 | For |
scenario 2 100K (with pause ~5 min) | During scenario 2 full harvests stopped due to ERROR: Error saving an xml document: The remote server returned an error: (504) Gateway Timeout. | |||
DI MARC Bib Update | PTF - Updates Success - 1 | 01:04:38 | 0.725 | 1.231 |
DI MARC Bib Update | PTF - Updates Success - 1 | 01:05:48 | 0.69 | 1.249 |
5 hours |
DI MARC Bib Update
PTF - Updates Success - 1
DI MARC Bib Update
PTF - Updates Success - 1
DI MARC Bib Update | PTF - Updates Success - 1 | 1166089 | 03:44:20 | 01: |
Comparisons
This table contains CI/CO response times without DI & OAI-PMH
Requests
50th pct
75th pct
95th pct
Average
Check-Out Controller
17:58 | 0. |
903 | 1. |
0.904
Check-In Controller
333 | |||
DI MARC Bib Update | PTF - Updates Success - 1 | 01:18:08 | 0. |
737 |
1. |
0.629
This table contains CI/CO response times with OAI-PMH
Requests
Average
Check-Out Controller
1.024
Check-In Controller
221 | |||||
DI MARC Bib Update | PTF - Updates Success - 1 | 01:21:21 | 0.62 | 1.106 | Last 30 minutes without OAI-PMH |
Comparisons
Comparison table for CI/CO response times
CI/CO only | CI/CO + OAI-PMH | CI/CO + OAI-PMH + DI Create 1k | CI/CO + oaiOAI- pmhPMH + DI Update 1k + oai-pmh | CI/CO after | CI/CO + OAI-PMH + DI Create 100k + oai-pmh | DI Update 100k + oai-pmh | OAI-PMHCI/CO between | CI/CO + OAI-PMH + DI Update 100k | CI/CO after | ||
Requests | Average | Averagedelta, % | Average | delta, % | Averagedelta, % | Averagedelta, % | Average | delta, %||||
Check-Out Controller | 0.904 | 1.024 ↑13.27% | 1.398 ↑ 54↑54. 6565% | 1.125 ↑24. 12545% ↑ 24 | 0. | 45900 | 1.545 ↑ 70.91↑70.91% | 0.914 | 1.231 ↑ 36↑36. 1717% | 1.024 | ↑ 13.270.926 |
Check-In Controller | 0.629 | 0.666 ↑5.88% | 0.961 ↑ 52↑52. 7878% | 0.706 ↑ 12.24↑12.24% | 0.625 | 1.078 ↑ 71.38↑71.38% | 0.569 | 0.725 ↑ 15↑15. 2626% | 0.666515 | ↑ 5.88 |
Scenario 1
Response time
This table shows s40 minutes of CI/CO
Service CPU Utilization
Service Memory Utilization
RDS CPU Utilization
Scenario 2
Response time
The table shows first 5 hours of CI/CO (it contains Create and 2 Updates with 100.000 records file)
The table shows second 5 hours of CI/CO (it contains 3 Updates with 100.000 records file)
Service CPU Utilization
Service Memory Utilization
RDS CPU Utilization
Errors
Scenario 1 - no errors
Scenario 2
All errors are connected to
Check-Out Controller
Request name | Number |
POST_circulation/check-out-by-barcode (Submit_barcode_checkout)_POST_422 | 8 |
GET_inventory/items (Submit_barcode_checkout)_GET_200 | 6 |
GET_groups_ID (Submit_patron_barcode)_GET_400 | 1 |
Appendix
Methodology/Approach
200 | 6 |
GET_groups_ID (Submit_patron_barcode)_GET_400 | 1 |
Appendix
Methodology/Approach
OAI-PMH (incremental) was carried out with manual stop from AWS instance machine after approximately 8000 instances and holdings were harvested up. To define time duration for the certain harvest just find difference between timestamps of second call and the last one in the definite log file in log folder.
Circulation rules should be modified before CI/CO test in Circulation rules editor to run it without issues from POST_circulation/check-out-by-barcode (Submit_barcode_checkout) side.
Partitions number should be equal to 2 in all DI related topics.
Before running OAI-PMH with full harvest, following database commands to optimize the tables should be executed (from https://wikifolio-org.folioatlassian.orgnet/wiki/display/FOLIOtips/OAI-PMH+Best+Practices#OAIPMHBestPractices-SlowPerformance):
|
- Execute the following query in a related database for removing existed 'instances' created by previous harvesting request and a request itself:
|
Infrastructure
- 8 m6i.2xlarge EC2 instances located in US East (N. Virginia)
- 2 instances of db.r6.xlarge database instances, one reader, and one writer
- MSK ptf-kakfa-3
- 4 brokers
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- og.retention.minutes=480
- default.replication.factor=3
Front End:
- Item Check-in (folio_checkin-8.0.100000491)
- Item Check-out (folio_checkout-9.0.100000595)
Modules
Partitions