In-Progress
Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration.
Overview
FOLIO clients often have the need to call an API to look up a record by UUID, one at a time for thousands if not hundred of thousands of records. The obvious disadvantage of retrieving one record at a time is the overhead associates with it. Currently querying for one record incurs a SELECT count_estimate() call, which equals the actual SELECT query time. Additionally, each API call to a storage module incurs a mod-authtoken call, which is needed to verify the token and permission of the caller. Therefore, this testing effort explores whether or not concatenating UUIDs in the CQL query string to retrieve records would be more efficient and if there was any downside to doing this.
This test will piggy back on PERF-98 test case, using the two currently optimized calls: GET /inventory-storage/holdings-storage and GET /inventory-storage/item-storage. It is not designed to behave in the same way as the actual data-exporting workflow, which would pause after each batch to process the retrieved records after each API call. More importantly UUIDs are concatenated in these calls, which deviates from the real use case.
Environment
- mod-inventory-storage-19.3.1
- okapi-3.1.2
- mod-authtoken-2.5.1
- mod-permissions-5.11.2
- 61 back-end modules deployed in 110 ECS services
- 3 okapi ECS services
- 8 m5.large EC2 instances
- 2 db.r5.xlarge AWS RDS instance (1 reader, 1 writer)
- INFO Okapi logging level
Summary
- Concatenating UUIDs in a batch can be done up to 50 records.
- Retrieving one record at a time at a rate of 40 requests/second uses up about 700MB of database RAM in 15 minutes. Retrieving more records at a time or having many concurrent users doing multi-records lookups concurrently will use up more DB memory and likely result the DB crashing when it runs out of memory.
- The optimal range in terms of time and resource is about 10-20 records for 1 user.
- It's up to the application developer to make a decision of concatenating UUIDs, and if so, how many UUIDs. Many applications have different needs, to look up a handful of records or hundreds of thousands of records. This report will lay out the facts for the developer to be informed about the pros and cons of concatenating UUIDs.
Test Results
The following tests were run with 8 virtual users for 15 minutes.
Test #1 and #2 were to establish a baseline and to see if there was any drawbacks to using hard-coded UUIDs in the JMeter test (versus having the UUIDs dynamically filled in at runtime), and the results were pretty much the same in response time, although the database did a lot of I/O lookups for the unique UUIDs test as expected.
We initially started out testing with 8 users. As seen here, for up to 20 UUIDs, the average response time were less than 40ms for both API calls and with minimal errors. As we added more UUIDs, 30 and 40 UUIDs, the response time doubled up and more errors occurred while the request rate, inversely proportional to the average response time, now down by half from the 1 record baseline test. The database restarted once during the 30 and 40 UUID test runs. Knowing that the data-export use case only sent up to a total of 40 requests per second between the two API calls, we scaled down the number of virtual users to 1 to see if it helps. Here are the results with 1 user:
With one user the request rate was able to be scaled down to a realistic 40 requests/sec (test #7). The average response time was less than 30ms for both API calls. Tests also ran for 40 and 50 UUIDs concatenation, and like the 8 users tests, request rate is cut down by half while the response time increased by half. The surprising thing was that even with 50 records the database did not crash during the test run. This may be because we took the prep step to restart the database before each test run to give it plenty of memory.
Services CPU Utilization
This graph shows the CPU utilization of the modules in the first six tests. The series that we are interested in is mod-inventory-storage which has the highest of all the spikes. With the request rate that was thrown at it and the number of records that it has to deserialize from strings, the CPU utilization is quite high. Okapi CPU utilization is constant across the tests.
mod-authtoken's CPU usage is rightly high because it's being called for each API call, but as expected its CPU utilization decreases when more records are concatenated as there is not as much need to go to mod-authentication. This decrease can be seen from 1 record to 10 records to 20 records to 30 and 40 records concatenation tests, with 40 UUIDs concatenation using the least CPU cycles.
The second round of tests runs with 1 virtual users produced the following results:
The pattern seen in 8 virtual users tests with varied numbers of UUIDs concatenated is repeated here in the 1 virtual tests. Higher CPU utilization for mod-inventory-storage as more UUIDs are being retrieved and processed, while mod-authtoken's CPU usage decreases.
Database
In the 8 virtual users tests the database CPU utilization across the tests doesn't vary much between 10 and 40 UUIDs concatenation tests. Retrieving 1 record at a time uses less CPU than when searching for many records at a time. Note the spikes and sudden in tests with 20, 30, and 40 records tests. This is due to memory issues, more below.
For the 1-user tests, the DB CPU usage is rather constant and is very low across all tests.
Previously there are spikes and sudden dips in CPU utilizations in 8-users tests. This is due to excessive memory usage by the database. The next two graphs show the database freeable memory being used up as time goes on. A fresh database starts out with nearly 8000MB but the more these tests were run the more memory was used up and held up so that eventually after a run or two the database crashed and restarted on its own, shown by the dips to nearly 100MB remaining.
Even with the 1 record test for 8 users, the database used up a lot of memory, about 4000MB. Tests with 10 through 40 records concatenation resulted in a crash during the 15 minutes test. During this time the DB Reader instance took over so a minimal disruption of no more than 2 minutes was observed.
Things were not much more promising with 1 user tests:
For a 1 user 1 record lookup test, the slope of memory usage is not as steep as in the 40 records or 50 records look up tests. These are 15 minutes tests.
Why is the database using up so much memory? There were no slow queries, as the queries used in these lookup operations utilized indexes. The only plausible is that perhaps the count_estimate() calls used up and held up memories because its logic is fairly complex and might be resource intensive: https://github.com/folio-org/raml-module-builder#estimated-totalrecords
Discussions
Concatenating UUIDs seemed like a good idea, up to a point. It certainly returns results a faster due to less trips to mod-inventory-storage, less trips to mod-authtoken, and less calls to SELECT count_estimate(). However, the storage module works much harder to deserialize all the records coming back, and the database uses up memory at a much faster rate and could result in a DB crash minutes into the test run if looking up more than 20 records at once. The amount of load also causes issues. If only one user creating loads (by concatenating UUDIs) then the database could handle it fairly easily, but if there were multiple users generating the same load then database errors would likely be generated. Likewise the more records being concatenated result in the more system resources being consumed, no matter the number of users. Therefore, it's up to the app developer to decide whether or not concatenating UUIDs in lookups make sense while taking into considerations the expected total number of records to look up (which equates to the time running these queries) and the number of concurrent users. This approach does not apply to a use case that has many concurrent users, for example, but may be applicable to background tasks.
Another factor to consider is in the near future, improvements made in caching Okapi tokens will naturally eliminate the need to go to mod-authtoken on every API call to get the same type of resource (
). Additionally, there might be an improvement in the way to lookup a one-record result that'd not execute SELECT count_estimate() in the database, thus it may not be as taxing on the database's memory ( ). If these improvements were made in the near future, concatenating UUIDs may not gain as much benefit.