ECS [Ramsons] Migrate/Update Large number BIB records
The general recommendation for report formatting:[Do not include the section in reports ]
Image wide 1600px. This width offers a good balance, ensuring that images are clear and detailed without causing excessive loading times or appearing too large on the page.
For graphs from AWS CloudWatch use 1 minute metrics aggregation.
Further changes are welcome
Overview
In production when mapping rules are changed - there’s a need to run migration process using mod-marc-migration module api.
Idea of this testing set is to observe and measure Marc-migration process duration in ECS environment from Central tenant and member tenant sides, find any possible issues and or performance problems.
[ A brief introduction about the content of the page:
What we are testing? Provide context of the test. Is it for a new service? Is it an experiment? Is it regression test?
Include major things like environment settings (ECS, non-ECS, Eureka, non-Eureka, w/RW split, etc…)
What are the goals of the testing? Ex: Want to see the effect of using a different ec2 instance type. If regression: to see how vB compares to vA
Include defined SLAs, if available
Reference the Jira(s)
]
Summary
[ A bulleted-list of the most important and relevant observations from the test results. What are the most important things the readers need to know about this testing effort? Some suggestions
Comparison to previous test or release of response times or API durations
Any notable changes
Particular response time or durations
Service memory and/or CPU utilization
RDS memory and/or CPU utilization
Other interesting observations
The summary points should answer the goals stated in Overview: did the test achieve the goals laid out? What goals were not met and why? SLAs were met or not?
]
Recommendations & Jiras (Optional)
[ If there are recommendations for the developers or operations team, or anything worth calling out, list them here.
Configuration options
Memory/CPU settings
Environment variables settings.
Also include any Jiras created for follow-up work]
Test Runs
[Table of tests with short descriptions. If there are motivations to run additional tests because of any reason, include a note column to explain]
Test # | Test Conditions | Duration | Load generator size (recommended) | Load generator Memory(GiB) (recommended) | Notes (Optional) |
---|---|---|---|---|---|
8 users CI/CO + DI 50k MARC BIB Create+ 10k Items editing | 30 mins | t3.medium | 3 |
| |
8 users CI/CO + DI 50k MARC BIB Create+ 10k holdings editing | 30 mins | t3.medium | 3 |
|
Results
|
|
|
|
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
[ Tables of detailed test results with comments]
Response Times (Average of all tests listed above, in seconds)
| Check-in-check-out | Bulk edit | Data Import | |||
---|---|---|---|---|---|---|
| Average (seconds) | Items | Holdings | MARC BIB | ||
| Check-in | Check-out | 10k records | 10k records | 50k Create | 50k Update |
Test 1 | 0.715 | 1.332 | 40 min | - | 20 min 19 sec | - |
Test 2 | 0.756 | 1.383 | - | 20 min 30 sec | 21min 07 sec | - |
Comparisons
[Part to compare test data to previous tests or releases. It's important to know if performance improves or degrades]
The following table compares additional test results to previous release numbers and to the CICO baselines Nolana (of Check In average time 0.456s and Checkout average time 0.698s). Note that Lotus numbers are in red, Nolana numbers are in black, and Kiwi numbers are in blue.
In the Nolana version, there is a significant improvement in the performance of data import and CheckIn/CheckOut.
For the baseline test the mod-source-record-manager version was 3.5.0 for the test with CI/CO it was 3.5.4. Maybe it is the reason why the time of Data Import with CI/CO is even better than without CI/CO.
| Profile | Duration KIWI (Lotus) without CICO | Duration with CICO 8 users KIWI (Lotus) | Duration Nolana without CICO | Duration with CICO 8 users Nolana | CheckIn average (seconds) | CheckOut average (seconds) | Deviation From the baseline CICO response times |
---|---|---|---|---|---|---|---|---|
5K MARC BIB Create | PTF - Create 2 | 5 min, 8 min (05:32.264) (08:48.556) | 5 min (05:48.671) | 2 min 51 s | 00:01:56.847 | 0.851 0.817 | 1.388 1.417 | CI: 44% CO: 51% |
5K MARC BIB Update | PTF - Updates Success - 1 | 11 min, 13 min (10:07.723) | 7 min 06:27.143 | 2 min 27s | 00:02:51.525 | 1.102 0.747 | 1.867 1.094 | CI: 39% CO: 36 |
Attach the link to the report from which the data for comparison was extracted.
Memory Utilization
Memory utilization across all modules was stable, with no memory leaks observed. The most heavily used module, mod-marc-migrations
, experienced an increase in memory usage from 25% to approximately 80% during testing with 'Optimal configurations.' However, after the process was completed, memory usage stabilized and returned to normal levels
CPU Utilization
CPU utilization during the data-saving process was stable and returned to normal levels after the test was completed. The most utilized service was mod-consortia
, with a peak usage of approximately 4.5%. The entire process, including reindexing by mod-search
, took approximately 4 hours. With 'Optimal configurations,' the data-saving process was faster by about 1 hour and 30 minutes compared to the default configuration, where it took 2 hours and 40 minutes.
Here is the CPU usage of the mod-marc-migrations
module during the data-mapping process. It is evident that mod-marc-migrations
can handle high loads and return to normal resource usage after the process is completed.
Note: Once the data-mapping process is finished, the CPU usage of mod-marc-migrations
becomes less noticeable, as most of the logic for the next step (data saving) occurs on the database side.
RDS CPU Utilization
[Description of notable observations of reader and writer instances CPU utilization with screenshots and tables, RDS Database connections, and other Database metrics]
Additional information from module and database logs (Optional)
[ Although it is optional to look at logs it is always recommended to look at the logs to see if there were any errors, exceptions, or warnings. If there were any, create Jiras for the module that generated the warnings/errors/exceptions]
Discussion (Optional)
[ This section gives more space to elaborate on any observations and results. See Perform Lookups By Concatenating UUIDs (Goldenrod)#Discussions for example. Anything that was discussed at length at the DSUs are worthy to be included here]
Errors
This section should detail any errors encountered during the testing process, their impact on testing outcomes, and the steps taken to address these issues.
Appendix
Infrastructure
[ List out environment's hardware and software settings. For modules that involve Kafka/MSK, list the Kafka settings as well. For modules that involve OpenSearch, list these settings, too]
PTF -environment ncp3 [ environment name]
9 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1 [Number of ECS instances, instance type, location region]
2 instances of db.r6.xlarge database instances, one reader, and one writer [database instances, type, size, main parameters]
MSK ptf-kakfa-3 [ kafka configurations]
4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
auto.create.topics.enable=true
log.retention.minutes=480
default.replication.factor=3
Modules memory and CPU parameters [table of services properties, will be generated with script soon]
Use fse-get-ecs-cluster-services-info Jenkins job to get table with services configuration.
Modules | Version | Task Definition | Running Tasks | CPU | Memory | MemoryReservation | MaxMetaspaceSize | Xmx |
---|---|---|---|---|---|---|---|---|
mod-inventory | 19.0.1 | 1 | 2 | 1024 | 2880 | 2592 | 512m | 1814m |
okapi | 4.14.7 | 1-2 | 3 | 1024 | 1684 (1512 in MG) | 1440 (1360 in MG) | 512m | 922m |
MG- Morning Glory release
Front End: [ front end app versions (optional)]
Item Check-in (folio_checkin-7.2.0)
Item Check-out (folio_checkout-8.2.0)
Dataset Size is important for testing. What was the size of the dataset? Include one or more related tables' sizes.
Methodology/Approach
[ In order to be able to reproduce the test, list the high-level methodology that was used to carry out the tests. This is important for complex tests that involve multiple workflows.
Preparation Steps: Provide a comprehensive overview of the preparation process preceding the test. This includes setting up the test scripts, configuring relevant parameters, and ensuring all necessary tools and resources are in place.
Data preparation scripts. In the context of performance testing, data preparation is a critical step to ensure that the testing environment accurately reflects real-world usage patterns and can handle the intended load efficiently. To facilitate this process, specific scripts are used to populate the test database with the necessary data, simulate user transactions, or configure the environment appropriately. Add links needed scripts to github and write a short description of how to use/run them.
Test Configuration: Specify the exact configurations utilized during the test execution. Duration, number of virtual users, ramp-up period etc.
It's important to inform readers of how the tests were performed so that they can comment on any flaw in the test approach or that they can try to reproduce the test results themselves. For example:
Start CICO test first
Run a Data Import job after waiting for 10 minutes
Run an eHoldings job after another 10 minutes
On another tenant run another DI job after 30 minutes in
The steps don't need to be very specific because the details are usually contained in the participating workflow's README files (on GitHub). However, anything worth calling out that was not mentioned elsewhere should be mentioned here.
Metric Collection Approach: Describe the methodology adopted to collect and interpret metrics during testing. Highlight the tools employed for data collection, SQL queries to get data, or other approaches(Get metrics from JMeter jtl reports) that was used for specific PTF-tests
Also, it is necessary to include the approach taken to obtain the final results. For example, please document if the results were obtained by zooming into a portion of the graphs in Grafana (which portion?, why?), how the numbers were calculated if not obvious. ]
Additional Screenshots of graphs or charts
[ Include additional screenshots of graphs on the Cloudwatch and Grafana dashboards for completeness sake. Include any raw data that includes the timestamps of tests or any screenshots/charts/graphs. These data may be separate files or may be one Miror board or one Sheet/Doc that has everything in it. Raw data is important to consult for additional insights if the report omits them initially. ]
Test Artifacts
Attach the test artifacts - excluding any sensitive data. These artifacts are deviations from the main files that were checked into Github, but are relevant for this test.