Overview

FOLIO clients often have the need to call an API to look up a record by UUID, one at a time for thousands if not hundred of thousands of records. The obvious disadvantage of retrieving one record at a time is the overhead associates with it. Currently querying for one record incurs a SELECT count_estimate() call, which equals the actual SELECT query time. Additionally, each API call out to a storage module incurs a mod-authtoken call, which is needed to verify the token and permission of the caller. Therefore, this testing effort explores whether or not concatenating UUIDs in the CQL query string would be more efficient in retrieving records, and if there was any downside to doing this.

This test will

Summary

Concatenating UUIDs in a batch can be done up to 50 records.
Retrieving one record at a time at a rate of 40 requests/second uses up about 700MB of database RAM in 15 minutes. Retrieving more records at a time or more users doing multi-records lookups concurrently will use up more DB memory.
The optimal range in terms of time and resource is about 10-20 records
It's up to the application developer to make a decision of concatenating UUIDs, and if so, how many UUIDs. Many applications have different needs, to look up a handful of records or hundreds of thousands of records. This report will lay out the facts for the developer to be informed about the pros and cons of concatenating UUIDs.

Test Results

The test that was run mimic the data-export workflow (PERF-98), making two calls: GET /inventory-storage/item-storage and GET /inventory-storage/holdings-storage but is not designed to behave in the same way as the actual data-exporting workflow, which would pause to process the retrieved records after each API call. More importantly UUIDs are concatenate in these calls which deviates from the real use case. These tests were run for 15 minutes each

Test #1 and #2 were to establish a baseline and to see if there was any drawbacks to using hard-coded UUIDs in the JMeter test (versus having the UUIDs dynamically filled in at runtime), and the results were pretty much the same in response time, although the database did a lot of I/O lookups for the unique UUIDs test.

We initially started out testing with 8 users. As seen here, for up to 20 UUIDs, the average response time were less than 40ms for both API calls and with minimal errors. As we added more UUIDs, 30 and 40 UUIDs, the response time doubled up and more errors occurred while the request rate is indirectly proportional to the average response time, now down by half from the 1 record baseline test. The database restarted once during the 30 and 40 UUID test runs. Knowing that the data-export use case only sent up to 40 requests per second, we scaled down the number of virtual users to 1 to see if it helps. Here are the results with 1 user:

With one user the request rate was able to be scaled down to a realistic 40 requests/sec. The average response time was less than 30ms for both API calls. Tests also ran for 40 and 50 UUIDs concatenation, and like the 9 users tests, request rate is cut down by half while the response time increased by half. The surprising thing was that even with 50 records the database did not crash during the test run.

CPU Utilization

This graph shows the CPU utilization of the modules in the first six tests. The one that we are interested are mod-inventory-storage which has the highest of all the spikes. With the request rates that were thrown at it and the number of records that it has to process, deserialize from string, the CPU utilization is quite high. The surprising thing is its CPU utilization for the 1-unique-record test is quite low compared to the 1 record, not unique. mod-authtoken's CPU usage is rightly high because it's being called for each API call, but as expected its CPU utilization is less when more records are concatenated as there is not as much need to go to mod-authentication.

The second round of tests runs with 1 virtual users produced the following results:

GET By Concatenating UUIDs

Overview

Summary

Test Results

CPU Utilization