[FOLIO-2874] SPIKE: investigate instrumentation performance overhead Created: 16/Nov/20  Updated: 21/Jan/21  Resolved: 25/Nov/20

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: P3
Reporter: Jakub Skoczen Assignee: Hongwei Ji
Resolution: Done Votes: 0
Labels: R1, platform-backlog, platform-core
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: PNG File okapi-4.3.3.png     PNG File okapi-4.4.0.png    
Sprint: CP: sprint 102
Story Points: 5
Development Team: Core: Platform
Release: R1 2021

 Description   

PTF has reported 2x CPU load overhead when instrumentation in Okapi is enabled.

Please investigate what can be done to limit the overhead:

  • analyze the metrics calculating code to check if there is any potential to improve the calculation. See if the metrics can be moved out of tight inner loops
  • experiment with Micrometer configuration to see if we can decrease the impact on CPU usage e.g by having a wider window for timers, disabling unwanted timers, etc


 Comments   
Comment by Hongwei Ji [ 25/Nov/20 ]

To see the impact of enabling metrics, I created a separate env and tested it with PTF data and tests. The CloudWatch CPU metrics for Okapi can be seen in attached screenshots.
The left side has metrics disabled and the right side has metrics enabled. The single spike in between was from Okapi restart after enabling metrics. In summary, for Okapi 4.3.3, there is about 20-30% CPU overhead, and Okapi 4.4.0 has less than 20%. Note, some metrics measurements were removed in Okapi 4.4.0.

Generated at Thu Feb 08 23:23:53 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.