Optimize Use of CQL2PgJSON

Description

Some performance testing showed bad performance of mod-patron-blocks under heavy load. Tracing showed that a HTTP request was accepted but actual execution of database queries took place towards the end of the request. A sample HTTP took 8 seconds to complete, no action for the first 7.8 seconds but then database queries execute afterwards.

Review of the code found that mod-patron-blocks' use of CQL2PgJSON is the culprit. CQL2PgJSON is initialized almost every HTTP request to the mod-patron-blocks. CQL2PgJSON initialization is expensive because it uses blocking file APIs to read the schema.json and other Jackson related activities. Even more critical - the event loop is blocked while initializing CQL2PgJSON because of the blocking file retrievals and general “long” duration of the task on the eventloop. I believe this is a major cause of the 8 second slowdown while CPU on the module instance was at less than 10%. It is very inefficient to initialize the class for the same “table” multiple times. For some HTTP requests, CQL2PgJSON is initialize at least three times for different tables.

Here is a CPU profile of /automated-patron-blocks/{userId} with about 300 invocations lasting less than 30 seconds.

image-20240422-140321.png

93% of CPU for this api call was spent in initializing PatronBlocksService which invokes many CQL2PgJSON classes under the covers.

Here is a memory profile from the same run as above.

image-20240422-140611.png

Most of the memory allocations is caused by instantiation of CQL2PgJSON. 99% of the memory allocations is caused by instantiation of PatronBlocksService. 72% of the memory allocations for the total duration of the module.

Remediation

After some rough code changes to initialize CQL2PgJSON only once for a “table”, here are the updated stats

Screenshot 2024-04-22 at 9.11.19 AM.png
Screenshot 2024-04-22 at 9.12.26 AM.png

Deploying the changes show massive decrease in execution times for mod-patron-blocks. The misuse of CQL2PgJSON is pervasive across older FOLIO modules in Inventory and Circulation. There should be some work to update CQL2PgJSON if it is to be used this way, but more efficient use of CQL2PgJSON should be employed in this module.

CSP Request Details

None

CSP Rejection Details

None

Potential Workaround

None

Attachments

4
  • 22 Apr 2024, 02:16 PM
  • 22 Apr 2024, 02:16 PM
  • 22 Apr 2024, 02:15 PM
  • 22 Apr 2024, 02:15 PM

Checklist

hide

Activity

Show:

Olamide Kolawole May 14, 2024 at 9:18 PM

Yes, I think we should wait for https://folio-org.atlassian.net/browse/RMB-995 to be completed first.

Alexander Kurash May 13, 2024 at 7:36 AM

Should this ticket be blocked by https://folio-org.atlassian.net/browse/RMB-995 (as well as CIRCSTORE-504)? If so, does this mean that we’d have to wait for an updated version of RMB to be able to use the new PgTable class?

Olamide Kolawole May 6, 2024 at 4:20 PM


The 300 invocations lasting less than 30 seconds figure is from my development machine. I don’t have the infrastructure to generate enough data and load. The figure was enough to get a profile from the code(screenshots). Using the latest RMB snapshot on the master branch of mod-patron-blocks show improved stats in the profile.

The 8 second latency figure is from a PTF environment on Poppy version of FOLIO. I could not build the latest RMB snapshot on that version of Poppy so I can’t get heavy throughput numbers in the PTF environment. I can’t get CPU to be more than 3% in my development machine even when i throw multiple threads and setting only one eventloop. I have reason to believe that the 8 second number might not be correct from the performance tooling. The okapi log during the performance run in the PTF environment is showing a much smaller duration than 8 seconds, if I have identified the right log. The trace data has been expunged due to time, so i can’t reference the trace any longer.

Concerning a new PatronBlockService being created every request, it seems most resources is spent in instantiating CQL2PgJson. schema.json is loaded 11 times for each request to /automated-patron-blocks/{userId}. I created this ticket here on this particular endpoint because of the 8 second duration that is now looking suspicious. As you found other endpoints use PgUtil and other RMB modules do the same as well.

Julian Ladisch May 3, 2024 at 12:15 PM

The issue description reports “300 invocations lasting less than 30 seconds” (= 10 requests per second), and heavy load resulting in latency of 8 seconds but the throughput under heavy load is missing.

Julian Ladisch May 3, 2024 at 12:00 PM

The performance issue might also be caused by swapping resulting from too small container memory.

Details

Assignee

Reporter

Priority

Development Team

Vega

Release

Not Scheduled

RCA Group

Implementation coding issue

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs

Created April 22, 2024 at 2:15 PM
Updated January 22, 2025 at 1:59 PM
TestRail: Cases
TestRail: Runs

Flag notifications