Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

PTF uses the carrier-io toolset for testing and visualizing test results. The PTF testing environment is a large scale FOLIO deployment that mirrors Bugfest's. This document outlines the challenges and strategy to hand over performance testing to the FOLIO teams.

Current Environment

Challenges

  • Knowledge transfer is the key because carrier-io has a steep learning curve
    • Using and administering carrier-io
    • Creating JMeter test scripts
    • Interpreting test results and troubleshoot issues
  • Environment: enabling team to to share, reuse testing environment
    • Each carrier-io instance has its own InfluxDB that stores all test data: JMeter test data, JVM profiling data, custom metrics data, etc...  One InfluxDB for all 8 teams to use will fill up and overwhelm the database very quickly.
    • Currently we are developing a large scale FOLIO deployment in the community's AWS account for PTF to use. Ideally we would have multiple.
    • Creating and maintaining a Big database.
    • Upgrading carrier-io

Proposals

  1. Teams need to learn how to work with carrier-io ASAP.  The best way is to embed team members within PTF so that they can learn and be trained.
    1. Each team chooses an engineer who loves tackling performance problems. This person will start out working with PTF to create JMeter test scripts for their team.  PTF will spend 20% of the time to work with these team members to train them to write carrier-io compatible JMeter scripts, deploying the test script and its artifacts to carrier-io, execute the Jenkins job, and interpreting test data with carrier-io.
      1. PTF team members will spend a maximum of 90 minutes each day to give hands-on training.  
    2. These team members will go back to their team with this knowledge and lead/teach their team doing performance testing tasks.
  2. Because each carrier-io instance has its own InfluxDB that stores all test data, all 8 teams can't use one carrier-io. Each team should have its own carrier-io instance. 
    1. The carrier-io instance should be spun up and scaled down to save cost. 
      Jira Legacy
      serverSystem Jira
      serverId01505d01-b853-3c2e-90f1-ee9b165564fc
      keyPERF-114
  3. Because teams will need to performance-test their work before releases, having one large-scale FOLIO environment is not enough.  There should be 2 to 3 (+1 for PTF?) large scale FOLIO environments that could be spun up and torn down on demand to save costs, all should be the same and should be on the same software versions. The "+1" is a dedicated environment for PTF as long as it exists.
    1. The 2-3 large scale FOLIO environments are to be shared among the teams
    2. Initially PTF will be responsible for upgrading the environments at the beginning of every sprint with the latest snapshots (usually of the commits whose stories were approved at the end of the previous sprints) based on FOLIO-SNAPSHOT software versions.  Later on teams should take over this responsibility. 
      1. Upgrading the environments regularly allows teams to test with the modules of latest versions, while balancing the instability of the frequent commits with upgrading the chain of dependencies required by the modules. 
      2. This upgrade includes running any database migration script to update the database. These scripts are run automatically when the module is enabled.
    3. Teams spin up an env to run tests, then drop env after testing - to restore the state for the next team to use.  The restored state is what was deployed at the beginning of each sprint.
      1. When teams spin up an environment, they will be able to customize the version of any module to be loaded if desired. This includes any released version or from master or from any branch.
        1. Ex: Team wants to performance test the mod-circulation-storage code on a branch. They will set the version of this mod-circulation-storage for it to be loaded on start up.
    4. Teams will have two hours 2 hours after running a test to collect data and examine results, after which the environment would automatically get shut down.
      1. If a team needs to run database migration script for its performance testing, in some cases it could take more than 2 hours. In this case the team running migration script will need to communicate with the teams that are after their timeslot to update them of the overtime use. 
      2. Note that once testing is finished and the environment is dropped, the data will be restored to the beginning of the sprint state, so all this migration will be gone. 
    5. There should be a Wiki page for teams to schedule a timeslot or timeslots to run their tests
    6. Teams should continue to follow the principles and guidance described in JMeter Scripts Contribution Guidelines to work with shared performance environments, this includes creating scripts to add test data and to restore the database after each test run.
  4. Environment Costs
    1. The following assumptions are made to determine costs
      1. Environments will be used for 1/3 of the time, or 10 days in a month
      2. Each day will be used for about 12 hours, or 1/2 day
      3. Therefore it will be a total of five (5) 24-hours days in a month, which equals to 1/6 of the full month
      4. The costs per environment is 1/6 of the normal use, taking advantage of the ability to spin up and tearing down the environment.


        1 Community FOLIO deployment (1-year Reserved)

        Price/hourHours/monthInstancesMonthly Cost
        EKS Cluster0.1120112.00
        Database (t3.xlarge)0.104120112.48
        EC2 (t3.xlarge)0.104120674.88
        Load Balancers (Classic)0.025120412.00
        EBS (general purpose, gp2)0.1/GB142 GB814.17
        Total FOLIO


        $125.53
        1 Carrier-io (1-year Reserved)
        EC2 (m5.xlarge - Reserved)0.121120114.52
        Spot instance (t3.medium)0.051510.75
        EBS (general purpose, gp2)0.1/GB200 GB12.00
        Total Carrier-io


        17.27
        Monthly Grand Total$142.80


    2. Each environment costs about $142/months, three environments = $426/month. 
    3. Using 3 years of reserved instances will bring down the costs to $112/month for one environment, 3 environments = $336/month (see attached spreadsheet CommunityPerfEnvironemtCosts.xlsx for more details)

How to Get There

  • Create sandbox environments (carrier-io and FOLIO) for teams to play with during the transition.
    • Jira Legacy
      serverSystem Jira
      serverId01505d01-b853-3c2e-90f1-ee9b165564fc
      keyPERF-114
       - Automate carrier-io installation/launching
    • Jira Legacy
      serverSystem Jira
      serverId01505d01-b853-3c2e-90f1-ee9b165564fc
      keyPERF-104
       - build large scale FOLIO environment. Need to be repeatable to scale up/down.
  • Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyPERF-102
     - Automatedly running tests checked into Github. Will need to add the step to spin up FOLIO when running tests. This helps to run tests and compare results against the previous test runs.
  • Documentation:
    • Jira Legacy
      serverSystem Jira
      serverId01505d01-b853-3c2e-90f1-ee9b165564fc
      keyPERF-110
       
    • Upgrading carrier-io going forward
    • Create how-to documentations to administer carrier-io
    • Create a diagram or a set of diagrams showing pieces of carrier-io and of FOLIO to communicate the architecture and responsibilities
    • Performance Analysis documentation: what to look for, log analysis (missing indexes, database logs for slow queries), pgHero, pgAdmin, Performance Insight , metrics, trouble signs (such as slowness - runaway CPU/memory, 500 errors, database memory etc..), Giraffe analysis. 

...