Platform, DevOps and Release Management (UXPROD-1814)

[UXPROD-1816] Real-world Load and Performance Testing Methodology Created: 03/Jun/19  Updated: 02/Nov/21

Status: Open
Project: UX Product
Components: None
Affects versions: None
Fix versions: None
Parent: Platform, DevOps and Release Management

Type: New Feature Priority: P2
Reporter: Mike Gorrell Assignee: Jakub Skoczen
Resolution: Unresolved Votes: 0
Labels: cap-mvp, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Blocks
is blocked by FOLIO-2296 enrich perf circulation data to match... Closed
Relates
relates to DEBT-6 Lack of real-life load test scenarios In Code Review
Epic Link: Platform, DevOps and Release Management
Back End Estimate: XXL < 30 days
Back End Estimator: Jakub Skoczen
Estimation Notes and Assumptions: XXL is the largest available. This item is likely larger.
Development Team: None
Kiwi Planning Points (DO NOT CHANGE): 20
Rank: Chicago (MVP Sum 2020): R1
Rank: Cornell (Full Sum 2021): R1
Rank: Duke (Full Sum 2021): R1
Rank: 5Colleges (Full Jul 2021): R1
Rank: GBV (MVP Sum 2020): R1
Rank: hbz (TBD): R1
Rank: Lehigh (MVP Summer 2020): R2
Rank: MO State (MVP June 2020): R4
Rank: TAMU (MVP Jan 2021): R1
Rank: U of AL (MVP Oct 2020): R2

 Description   

We need an environment, set of tests and an approach to load and performance testing that exercise the system based on realistic usage/behavior. Things like waits, concurrency, sequence, parameters, etc... all of these things can contribute to realistic and valuable test results, or if not founded in real-world and expected usage, garbage results that force us to chase our tail. There's no sense in working on optimizing an API so that it can stand up to a pounding if it will never be pounded in real life.

We need to test front-end modules as well as backend. We've all seen times when the frontend is not using the backend correctly and causing performance problems as a result. Backend components' contribution to performance are obvious.

I think we need a few things:
1) An environment we can use for end-to-end and API testing. We need to understand limitations and uniqueness of these environments - how applicable are they to a live production environment such that when an issue comes up we can determine it's importance. Ideally there'd be a nice direct correlation between the environment and a production environment.
2) A transparent and well understood testing methodology. We throw around terms like "50 user test". What does that mean? What does each user do? are they identical? If not identical, what's the distribution of what they do (simple vs complex, etc.)..? Are there sleep times between actions? Do they ramp up and ramp down? How long does each live?
3) Ability to replicate results - so that we can see regressions and also so that we can reproduce or diagnose problems when needed.
4) We need to understand what realistic system behavior is. For a given tenant, what % of activity is read vs write, circulation vs acquisitions, etc. Some parts of FOLIO may be isolated such that it doesn't matter what's happening in Acquisitions as far as Inventory or Circulation go... but some pieces of the system MAY care. This also may depend on how the system is laid out in terms of deployment and load balancing (not just software architecture).

This epic/issue will be used to track the various activities that relate to the overall load and performance testing approach for FOLIO.



 Comments   
Comment by Jakub Skoczen [ 05/Jun/19 ]

Mike Gorrell

I think a next step would be to start breaking down this epic into user stories. Initially large and high-level, that’s okay, we can break them down further and/or elevate some to the epic level. Couple of high level user-stories come to my mind:
• spec out the environment, including:
a) how is it hosted and constructed — is there any automation e.g built daily based on snapshots, or quaterly based on releases. The frequency of testing will be affected by this (and will also influence this)
b) data set — what data set do we use (choose one), some of it ike bib data could be a real life data set, other likely (like transaction) needs to be artificial but preferably modelled after real life data. How is this data set going to get updated (FOLIO adds new records and extends existing one from version to version) and whose responsibility is it (e.g programmers vs devops)
• define the test methodology — establish main guidelines on things like warm up, cool down between tests, level of concurrency for each tests (number of virtual users, their ramp up and ramp down), level of concurrency between different test cases (if any). Establish some baselines on what we treat as acceptable performance and what is not acceptable (probably a topic in it’s own right)
• define initial set of test scenarios — there could be many so the first step is to focus on most frequent real-life operations. How do we get this information? Look at logs from existing systems? If so, this is something where sysop sig might be instrumental. A taks for them to analyze their logs. Going forward we need to have a process on extending the initial test cases — adding more test will involve adding more perf data — again depending on how we want to do it’s either a responsibility of the test author, developer building new functionality or the devops guys maintaining the tests.
• plan how the test process (and results) feed into the development process — e.g if we focus on testing quarterly releases I would imagine the perf data sets and scenarios are updated and ready when a new release comes out, the perf testing team tests it and creates appropriate tickets for the dev team(s). Dev teams will need to have some ways to reproduce or analyze the perf problems: e.g access to monitoring tools on the env and ability to rerun test cases, or the instrumentation data must be available to them for analysis (sometime it’s probably enough). This is more of a general thing in FOLIO and must be addressed because the people running FOLIO and tracking perf issues are not the same as the people fixing them, so there must be process for how to report perf problems effectively.

Comment by Mike Gorrell [ 05/Jun/19 ]

Added several stories.

Comment by Martin Tran [ 17/Jun/19 ]

define initial set of test scenarios — there could be many so the first step is to focus on most frequent real-life operations. How do we get this information? Look at logs from existing systems?

For the initial set of test scenarios we could rely on the libraries' existing data, as they would have information on rates of check in/out, requests, renew, and other workflows. Using this data we can assess which ones are more used than others. Going forward we can get this information by adding custom metrics to our modules. Metrics such as rate, count, and timer that are strategically placed in various parts of the module can help to identify which APIs or paths of the code are being called and their frequency of being called. For the first release we can't rely on these metrics because they won't be exercised by real customers yet, but once a real customer starts using the system, these metrics will light up and able to tell us the scenarios and workflows that customers often use. Having metrics can also tell us how the system performs in real time, and aid with analyzing performance issues.

Comment by Cate Boerema (Inactive) [ 29/Jul/19 ]

Mike Gorrell and Jakub Skoczen can we assign this to someone and get a PO rank on it. Thanks!

Generated at Thu Feb 08 23:17:55 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.