Optimize Use of CQL2PgJSON
Description
CSP Request Details
CSP Rejection Details
Potential Workaround
Attachments
- 22 Apr 2024, 02:16 PM
- 22 Apr 2024, 02:16 PM
- 22 Apr 2024, 02:15 PM
- 22 Apr 2024, 02:15 PM
Checklist
hideActivity
Olamide Kolawole May 14, 2024 at 9:18 PM
@Alexander Kurash Yes, I think we should wait for https://folio-org.atlassian.net/browse/RMB-995 to be completed first.
Alexander Kurash May 13, 2024 at 7:36 AM
@Olamide Kolawole Should this ticket be blocked by https://folio-org.atlassian.net/browse/RMB-995 (as well as CIRCSTORE-504)? If so, does this mean that we’d have to wait for an updated version of RMB to be able to use the new PgTable
class?
Olamide Kolawole May 6, 2024 at 4:20 PM
@Julian Ladisch
The 300 invocations lasting less than 30 seconds
figure is from my development machine. I don’t have the infrastructure to generate enough data and load. The figure was enough to get a profile from the code(screenshots). Using the latest RMB snapshot on the master branch of mod-patron-blocks show improved stats in the profile.
The 8 second latency figure is from a PTF environment on Poppy version of FOLIO. I could not build the latest RMB snapshot on that version of Poppy so I can’t get heavy throughput numbers in the PTF environment. I can’t get CPU to be more than 3% in my development machine even when i throw multiple threads and setting only one eventloop. I have reason to believe that the 8 second number might not be correct from the performance tooling. The okapi log during the performance run in the PTF environment is showing a much smaller duration than 8 seconds, if I have identified the right log. The trace data has been expunged due to time, so i can’t reference the trace any longer.
Concerning a new PatronBlockService being created every request, it seems most resources is spent in instantiating CQL2PgJson. schema.json
is loaded 11 times for each request to /automated-patron-blocks/{userId}
. I created this ticket here on this particular endpoint because of the 8 second duration that is now looking suspicious. As you found other endpoints use PgUtil and other RMB modules do the same as well.
Julian Ladisch May 3, 2024 at 12:15 PM
The issue description reports “300 invocations lasting less than 30 seconds” (= 10 requests per second), and heavy load resulting in latency of 8 seconds but the throughput under heavy load is missing.
Julian Ladisch May 3, 2024 at 12:00 PM
The performance issue might also be caused by swapping resulting from too small container memory.
Some performance testing showed bad performance of mod-patron-blocks under heavy load. Tracing showed that a HTTP request was accepted but actual execution of database queries took place towards the end of the request. A sample HTTP took 8 seconds to complete, no action for the first 7.8 seconds but then database queries execute afterwards.
Review of the code found that mod-patron-blocks' use of
CQL2PgJSON
is the culprit.CQL2PgJSON
is initialized almost every HTTP request to the mod-patron-blocks.CQL2PgJSON
initialization is expensive because it uses blocking file APIs to read the schema.json and other Jackson related activities. Even more critical - the event loop is blocked while initializingCQL2PgJSON
because of the blocking file retrievals and general “long” duration of the task on the eventloop. I believe this is a major cause of the 8 second slowdown while CPU on the module instance was at less than 10%. It is very inefficient to initialize the class for the same “table” multiple times. For some HTTP requests,CQL2PgJSON
is initialize at least three times for different tables.Here is a CPU profile of
/automated-patron-blocks/{userId}
with about 300 invocations lasting less than 30 seconds.93% of CPU for this api call was spent in initializing
PatronBlocksService
which invokes manyCQL2PgJSON
classes under the covers.Here is a memory profile from the same run as above.
Most of the memory allocations is caused by instantiation of
CQL2PgJSON
. 99% of the memory allocations is caused by instantiation ofPatronBlocksService
. 72% of the memory allocations for the total duration of the module.Remediation
After some rough code changes to initialize
CQL2PgJSON
only once for a “table”, here are the updated statsDeploying the changes show massive decrease in execution times for mod-patron-blocks. The misuse of
CQL2PgJSON
is pervasive across older FOLIO modules in Inventory and Circulation. There should be some work to updateCQL2PgJSON
if it is to be used this way, but more efficient use ofCQL2PgJSON
should be employed in this module.