(0) JMeter Scripts Contribution Guidelines


In order to build up a regression test suite of performance workflow tests, teams from the FOLIO community are asked to contribute performance test scripts.  PTF will execute the tests and check for performance issues and reporting them back to the team by posting results on Confluence. The following guidelines are to ensure a successful collaboration between the teams and PTF.

Process

  • When requesting a test to be executed, teams should create a JIRA user story on PTF board and provide the following information:
    • Service Level Agreements
    • Scripts (JMeter and other scripts).  It's recommended to check these scripts into a common repository and simply refer to them from the JIRA ticket.
    • Default settings: #users, duration, test type (longevity, capacity, baseline, fixed load, comparison). See 36574513 for more details.

Service Level Agreements (SLAs)

  • Service Level Agreements are expected response times for API calls and the passed/failed conditions (e.g., response time, throughput).
  • Teams should work closely with their POs to define workflows and SLAs for them. It is recommended that submitted requests should be reviewed by Anton.

Scripts

Teams to provide:

  • JMeter scripts for workflows, see below for guidelines. 
  • Schema upgrade and data migration scripts (no separate scripts required, should be part of the modules deployment)
  • Scripts to restore the database to original state
    • When running tests inevitably the database's data will get changed.  In order to get consistent results from run to run the database needs to be in the same state at the beginning of each run. The dataset PTF uses is UChicago's, and it has approximately 27 million records of instances, items, and holdings records total, and approximately 15,000 acquisition records. It takes > 5 hours to load the data from scratch so it cannot be wiped out and reloaded after each test run easily.  Therefore teams are asked to provide a script to be run after each test execution to restore the database state. Seed data may be retained but their state needs to be restored to the original state.
      • For example, the check-in-check-out workflow test script changes the items' status from "Available" to "Checked out" as the test is executed. There should be a database-restore script to change items that have statuses from "Checked out" back to "Available". 
  • Data scripts
    • Seed data and runtime data required by the JMeter script itself
      • Seed data is data that are needed to run the test.
        • Data required for testing are needed to be accessible by anyone who executes the script (i.e., everyone in the community)
        • Seed data may be generated on each test run or generated once
          • A script needed to generate the data
          • If generated on each test run, there should be a script to delete this data.
          • If generated once, there should be a script to load the data (unless generating the data directly into the DB) and a script to restore the state of this data so that each test run is started from the same condition.
        • Seed data may be pre-made:  data are stored in CSVs or TSVs or exported out from some pre-existing database to CSVs or TSVs.
          • If the data files are big, then they need to be put in the team's S3 bucket which is accessible by PTF. The path to the data on S3 is specified in the README.
          • If the data files are small, compress them and check them into Github (<20MB)
          • A script that downloads and loads the data is required.
        • Scripts can be SQL, python, Postman, bash. Just need to provide instructions on how to run them. Over time we will narrow down the technologies and methodologies to one or two ways when we learned which one is better.
        • Examples:
          • If there is a Renew workflow test, then there should be data in the database that has users with a varied amount of loans to do renew. It's safe to assume that PTF's database does not have this data so the user and loan data are needed. 
          • A sample script to load data is here. (In this case the data is already stored on a local server)
          • A sample script to restore the database state is here
      • Runtime data are the data that the JMeter test needs to execute. For example, in a Check-in scenario, the runtime data required here are: user and item barcodes. These need to be fed into to the JMeter test script continously for the entire duration of the test run
        • Runtime data are usually IDs. These can be extracted from the seed data or from the general database (such as users barcodes, for instance, if they can be arbitrary).
        • Runtime data are checked into Github, or can be downloaded from S3 if they are big.
        • JMeter consumes the runtime data in CSV files.  In practice it's better to have multiple files of one column than to have one file with multiple columns.  That is, don't combine all user barcodes and item barcodes into one file but maintain them in separate files instead.
    • Team is responsible for reviewing the scripts
    • Teams are responsible for updating the script as the workflow changes over time.
    • The Check-in-check-out JMeter script can be used as a reference, but other test scripts in the PTF repository can be consulted as well. 
    • JMeter script should be compatible to the JMeter version used by PTF, which is v5.2.1
  • PTF proposes that the JMeter test scripts and their related contents are stored in a common repository on Github. Since these scripts test workflows and workflows span across multiple modules, storing the scripts in individual module's repository will cause confusions.
    • The repository is https://github.com/folio-org/perf-testing/workflows-scripts/[folio-app]/[worflow-name]
      • (There are two folders: workflow-scripts and api-scripts.  Workflow scripts are scripts that contain more than one step that correspond to multiple API calls (excluding the log-in call) that are relevant to the API call.  Workflow scripts are more complex, need to be modeled appropriately so that they are as realistic to the typical user action as possible. API-scripts on the other hand are for single API calls that can be used for bench-marking.)
    • Use tags to differentiate versions of the script.  Tags can correspond to a release, or what is logical, e.g., Q1.20.x.y where x is the version # and y is the workflow name (or should it be the workflow name as the leading characters?). 
      • Currently we are tagging the repository based on the release. Going forward we'd like to maintain this periodic frequency.
    • README: Each workflow script is required to have a README. It lists required modules that the script depends on to run, any JMeter configurations required by the script (or pointed to, in which case include paths to get them). See check-in-check-out README
    • JMeter script should work out of the box - no adjustments to find paths of configuration files.
    • /folio-org/perf-testing/tree/master/workflows-scripts/circulation/check-in-check-out is the sample workflow stored on Github. Teams are free to define the sub-directories in the workflow directory.
      • Desired directory structure:
        • /workflow-scripts
          • /[App name or major areas (e-holdings, orders, inventory, request, circulation, etc..)] .  circulation is a major area because here we have a check-in-check-out workflow that spans across two apps. 
            • /[workflow name]
              • Main jmx file is in the root folder
              • /scripts directory to house supporting scripts
              • /jmeter-supported-data is to store runtime data files that the JMeter script needs to execute with, such as configuration or other data. 
                • NB: Do NOT check in credentials file that hasactual values. When PTF runs the script we will update the credentials file appropriately.
              • README.md 

Environment

PTF FOLIO Environment 

Every two weeks the performance environment will be refreshed.  Refreshing the environment is not trivial as it requires running upgrade scripts to update the storage modules JSON  schema and the corresponding data in the database, and also to upgrade the modules themselves.  Running the data migration scripts against a total of 27M records database could take up to 9 hours.  Therefore down time is expected.  The two weeks interval balances testing continuity and getting performance feedback to the community in reasonable time. This is mainly for tests of new features.  What does this mean to the JIRAs? It means that JIRAs that depend on a newer version of modules compared to what is deployed will have to wait up to two weeks before they are executed.

Currently, however, PTF is working on a FOLIO release schedule to refresh the performance environment once every quarter and pulling in hot-fixes every two weeks.  We will continue doing this until there are tests that are written for new features, or there are special requests to run the tests before the feature is released to check for regressed performance issues. 

Teams' Development Environments

PTF recommends that teams create and test all their scripts using their scratch-development environment. This process does not require running the performance tests on a large scale, only with one or two virtual users and a limited number of records to validate that their tests are working as intended.

Tests Scenarios

There are many scenarios and combinations of scenarios to create JMeter test scripts.  It's suggested to start out building tests for common workflows for the area of FOLIO that teams are responsible for and add more of them from there.

Creating JMeter Test Script

JMeter Script Guidelines

JMeter scripts should adhere to the following principles:

  1. Able to be plugged into the carrier-io environment with no changes required, or at most only credentials file.
  2. Model a workflow as accurately as possible to how a human would interact from the UI (more below)
  3. Follow commonly established patterns:
    1. Naming convention of JMX file: folioAppName_flowName.jmx (camel-casing with underscore separating the FOLIO app name from the workflow name).
      1. folioAppName: such as Inventory, Requests, Users, eHoldings, etc... are used to group the like workflows together
      2. If there are complex workflows, those that consists of several workflows, come up with a logical name for it.  For example, we have a check-in-check-out workflow that consists of two workflows. 
    2. Test plan and JMX file names should be the same.
    3. JMX file should have some or all of the following user-defined variables, as needed. These variables are being set by the Jenkins job that kicks off the test with values specified in the JIRA
      1. HOSTNAME: okapi's URL without the HTTP protocol in front (required)
      2. DISTRIBUTION: the distribution of load in a complex workflow.  For example, the check-in-check-out workflow has two flows with 43% and 57% load distribution. The value of 43-57 is specified indicating this distribution. If there are more flows then the subsequent distribution will be appended to this string, separated by the dash.  The total of all the numbers in the distributions should add up to 100. 
      3. global_BaseDir: a base file path that will be combined with other variables' values to produce a path to files that the script needs to run with (required)
      4. global_vusers: number of virtual users. Any number below 10 can be used for default value. (required)
      5. global_rampup: the ramp up time in seconds for all virtual users. For example, if we specify 50 seconds, and there are 5 virtual users, then this means that a virtual user will be introduced every 10 seconds. (required, defaulted to 10 seconds for each v-user added)
      6. duration: test duration in seconds. (required - values to be populated at runtime and as specified in the JIRA)
      7. port: port of OKAPI URL (required )

Dos and Don'ts:

Dont's: Custom Java code inside JMeter script will overload the JVM.  

Dos: Use JMeter controllers instead of writing up custom code

Helper Tools

Writing up the entire JMeter test script from scratch may be daunting for those who have not worked with JMeter before. There are two tools to help facilitating this process:

  1. Use the BlazeMeter tool to capture API calls that are invoked by the UI in the workflow.  This tool can quickly generate a JMX file.  However, this JMeter test script isn't production grade, many things will need to be done to it:
    1. Remove the hard-coded URLs and query string values. These captured calls are only good for one set of values, but in a performance test run of 30-60 minutes it'd iterate over the steps continuously, so the variables in the URL will need to be filled in dynamically with data. This requires coming up with appropriate data and parameterizing the URLs to read values from a file.
    2. Genericize it. For example, this tool picks up 10 calls to POST /circulation/renew-by-barcode because the API happens to make 10 calls to this endpoint for the 10 items to be renewed. In the script, we'd need to clean this up and put it in a loop so that it can read the x number of barcodes from the list of loans, for example. 
    3. Model the script to make it as realistic as possible in order to reproduce the behavior faithfully. There are calls that are asynchronous on the front end, or calls that kick off an asynchronous process on the back end. BlazeMeter wouldn't be able to pick up these calls. Therefore we need to work around or to improvise the script. For example, the Data Export workflow makes asynchronous calls to mod-source-record-storage from the FOLIO front-end so we had to mimic the polling calls to pool for exporting status.  Look here more details about this Data Export workflow. 
      1. Add realistic delays/pauses in the test script to mimic human's delays when going from one page to another
      2. Add distribution loads between sub-workflows for complex workflows that combine multiple flows, such as check-in and check-out flows.
  2. Consider using the template for creating a new JMeter test script and repository. The aforementioned JMeter script's variables are filled in.  One could start out with this template script and copy over the API calls gathered by the BlazeMeter tool in step #1.
    1. Change the JMX file name, Test Plan name, and other variable names or values
    2. Add more Controllers for workflow(s)

Test Types

The following test types are commonly used by PTF.  PTF is open to running other test types if requested.

Baseline: Baseline tests are tests to establish a baseline of performance. Typically it is run with 1 virtual user for 30 minutes to establish a baseline for other tests to compare against.

Fixed Load: These tests are designed to run with a fixed number of concurrent virtual users for a period of time.  These tests are run to obtain an understanding of response time and throughput of a feature.    At this time PTF runs tests with 1, 5, 8, 10, or 20 virtual users.  When FOLIO is more performant PTF will increase the number of virtual users.  Usually the period of time to run is 30 minutes. If the test run shows some kinds of inconsistencies or questionable throughput or response time pattern, PTF would rerun the test for 60 minutes or longer.  Typically fixed load tests are run in a series of the concurrent virtual users described to gauge the behavior of the system under a range of number of users.

Capacity: Capacity tests are exploratory tests to understand how the system behaves as the load increases. It is useful to understand the maximum load or number of virtual users the system can perform reasonably well to its breaking point. These tests are of the trial-error type (adding more users to see when the system breaks).  

Comparison: Comparison tests are for the purpose of comparing results of current build versus past builds or past test results.  Usually it's done by running a couple of or all fixed load tests on two environments to compare against (e.g., Fameflower and Goldenrod), or if data from past test runs are reliable then only running test on the latest version and comparing the results against past data.

Longevity: Longevity tests are run for at least 8 hours to detect abnormal patterns over a long period of time. Usually memory leaks are discovered by longevity tests.