Overview
- The purpose of the concurrent OAI-PMH, data import and CI/CO tests is to determine the areas which may be affected by increasing of harvests frequency.
Summary
- During test executions start it was observes growth of Service Memory Usage for all services. It's connected to the cluster daily start. For major services memory usage didn't exceed the level of 60%. The highest level was registered for mod-source-record-manager 107% and mod-inventory-b 98%. After tests for Scenario 1 it achieved its stable level and didn't change.
- Running OAI-PMH, DI and CI/CO simultaneously it has been shown that the environment can handle such load.
- CI/CO response times during DI and OAI-PMH didn't degrade after a row of DI (create and update job profiles).
- After 90 minutes of full harvest the growth of CPU utilization for mod-oai-pmh-b up to 188 % was observed during 10 minutes with getting back to steady state ( 5-7 % ).
- Service CPU Utilization at the beginning of DI mostly used by mod-di-converter-storage-b ( 253 % ), mod-inventory-b ( 172 % ), mod-quick-marc-b ( 108 % ). For the rest of modules it was under 70%. At the highest level it was mod-di-converter-storage-b ( 453 % ), mod-inventory-b ( 190 % ), mod-quick-marc-b ( 121 % ).
- RDS CPU Utilization during incremental harvesting didn't exceed 60 % for all DI job profiles (1.000 records). Data export took 40% But for full harvesting with DI Create job profile (100.000 records) it became instantly 96 % and stayed on this level major part of process. DI Update used up to 90%.
Recommendations & Jiras
- During testing observed unhealthy behaviour from mod-remote-storage-b service side (reason Health checks failed with these codes: [404]). - PERF-618Getting issue details... STATUS The same unhealthy behaviour was from mod-licenses-b and mod-service-interaction-b (reason Health checks failed with these codes: [502])
Test Runs & Results
Data import duration and CI/CO response times with DI & OAI-PMH results
Test # | CI/CO | Scenario | Job profile | Duration | CI average | CO average | Load level | Comments |
Scenario 1 OAI-PMH incremental | 5 hours | DI MARC Bib Create | PTF - Create 2 | 00:00:48 | 0.961 | 1.398 | For scenario 1 1K (with pause ~5 min) | |
DI MARC Bib Update | PTF - Updates Success - 1 | 00:00:56 | 0.706 | 1.125 | ||||
DI MARC Bib Create | PTF - Create 2 | 00:00:43 | 0.843 | 1.402 | ||||
DI MARC Bib Update | PTF - Updates Success - 1 | 00:00:44 | 0.848 | 1.335 | ||||
Scenario 2 OAI-PMH full mode | DI MARC Bib Create | PTF - Create 2 | 00:53:30 | 1.078 | 1.545 | For scenario 2 100K (with pause ~5 min) | ||
DI MARC Bib Update | PTF - Updates Success - 1 | 01:04:38 | 0.725 | 1.231 | ||||
DI MARC Bib Update | PTF - Updates Success - 1 | 01:05:48 | 0.69 | 1.249 | ||||
5 hours | DI MARC Bib Update | PTF - Updates Success - 1 | 01:17:58 | 0.903 | 1.333 | |||
DI MARC Bib Update | PTF - Updates Success - 1 | 01:18:08 | 0.737 | 1.221 | ||||
DI MARC Bib Update | PTF - Updates Success - 1 | 01:21:21 | 0.62 | 1.106 | Last 30 minutes without OAI-PMH |
Comparisons
This table contains CI/CO response times without DI & OAI-PMH
Requests | 50th pct | 75th pct | 95th pct | Average |
Check-Out Controller | 0.862 | 0.935 | 1.133 | 0.904 |
Check-In Controller | 0.581 | 0.633 | 0.827 | 0.629 |
Comparison table for CI/CO response times
CI/CO | DI Create 1k + oai-pmh | DI Update 1k + oai-pmh | DI Create 100k + oai-pmh | DI Update 100k + oai-pmh | |||||
Requests | Average | Average | delta, % | Average | delta, % | Average | delta, % | Average | delta, % |
Check-Out Controller | 0.904 | 1.398 | ↑ 35.34 | 1.125 | ↑ 19.64 | 1.545 | ↑ 41.49 | 1.231 | ↑ 26.56 |
Check-In Controller | 0.629 | 0.961 | ↑ 34.55 | 0.706 | ↑ 10.91 | 1.078 | ↑ 41.65 | 0.725 | ↑ 13.24 |
Scenario 1
Response time
Service CPU Utilization
TBD
Service Memory Utilization
TBD
RDS CPU Utilization
TBD
Scenario 2
Response time
Service CPU Utilization
TBD
Service Memory Utilization
TBD
RDS CPU Utilization
TBD
Appendix
Methodology/Approach
Circulation rules should be modified before CI/CO test in Circulation rules editor to run it without issues from POST_circulation/check-out-by-barcode (Submit_barcode_checkout) side.
Partitions number should be equal to 2 in all DI related topics.
Before running OAI-PMH with full harvest, following database commands to optimize the tables should be executed (from https://wiki.folio.org/display/FOLIOtips/OAI-PMH+Best+Practices#OAIPMHBestPractices-SlowPerformance):
|
- Execute the following query in a related database for removing existed 'instances' created by previous harvesting request and a request itself:
|
Infrastructure
- 8 m6i.2xlarge EC2 instances located in US East (N. Virginia)
- 2 instances of db.r6.xlarge database instances, one reader, and one writer
- MSK ptf-kakfa-3
- 4 brokers
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- og.retention.minutes=480
- default.replication.factor=3
Front End:
- Item Check-in (folio_checkin-8.0.100000491)
- Item Check-out (folio_checkout-9.0.100000595)
Modules
Partitions