Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Teams need to learn how to work with carrier-io ASAP.  The best way is to embed team members within PTF so that they can learn and be trained.
    1. Each team chooses an engineer who loves tackling performance problems. This person will start out working with PTF to create JMeter test scripts for their team.  PTF will spend 20% of the time to work with these team members to train them to write carrier-io compatible JMeter scripts, deploying the test script and its artifacts to carrier-io, execute the Jenkins job, and interpreting test data with carrier-io.
      1. PTF team members will spend a maximum of 90 minutes each day to give hands-on training.  
    2. These team members will go back to their team with this knowledge and lead/teach their team doing performance testing tasks.
  2. Because each carrier-io instance has its own InfluxDB that stores all test data, all 8 teams can't use one carrier-io. Each team should have its own carrier-io instance. 
    1. The carrier-io instance should be spun up and scaled down to save cost. 
      Jira Legacy
      serverSystem JiraJIRA
      serverId01505d01-b853-3c2e-90f1-ee9b165564fc
      keyPERF-114
  3. Because teams will need to performance-test their work before releases, having one large-scale FOLIO environment is not enough.  There should be 2 to 3 (+1 for PTF?) large scale FOLIO environments that could be spun up and torn down on demand to save costs, all should be the same and should be on the same software versions. The "+1" is a dedicated environment for PTF as long as it exists.
    1. The 2-3 large scale FOLIO environments are to be shared among the teams
    2. Initially PTF will be responsible for upgrading the environments at the beginning of every sprint with the latest snapshots (usually of the commits whose stories were approved at the end of the previous sprints) based on FOLIO-SNAPSHOT software versions.  Later on teams should take over this responsibility. 
      1. Upgrading the environments regularly allows teams to test with the modules of latest versions, while balancing the instability of the frequent commits with upgrading the chain of dependencies required by the modules. 
      2. This upgrade includes running any database migration script to update the database. These scripts are run automatically when the module is enabled.
    3. Teams spin up an env to run tests, then drop env after testing - to restore the state for the next team to use.  The restored state is what was deployed at the beginning of each sprint.
      1. When teams spin up an environment, they will be able to customize the version of any module to be loaded if desired. This includes any released version or from master or from any branch.
        1. Ex: Team wants to performance test the mod-circulation-storage code on a branch. They will set the version of this mod-circulation-storage for it to be loaded on start up.
    4. There should be a Wiki page for teams to schedule a timeslot or timeslots to run their tests
    5. Teams will have two hours after running a test to collect data and examine results, after which the environment would automatically get shut down. (Since the data will be stored in a persistent EBS volume, the test data won't be dropped after the environment is shut down.)
      1. If a team needs to run database migration script for its performance testing, in some cases it could take more than 2 hours. In this case the team running migration script will need to communicate with the teams that are after their timeslot to update them of the overtime use. 
      2. Note that once testing is finished and the environment is dropped, the data will be restored to the beginning of the sprint state, so all this migration will be gone. 
    6. Teams should continue to follow the principles and guidance described in (0) JMeter Scripts Contribution Guidelines to work with shared performance environments, this includes creating scripts to add test data and to restore the database after each test run.
  4. Environment Costs
    1. The following assumptions are made to determine costs
      1. Environments will be used for 1/3 of the time, or 10 days in a month
      2. Each day will be used for about 12 hours, or 1/2 day
      3. Therefore it will be a total of five (5) 24-hours days in a month, which equals to 1/6 of the full month
      4. The costs per environment is 1/6 of the normal use, taking advantage of the ability to spin up and tearing down the environment.


        1 Community FOLIO deployment (1-year Reserved)

        Price/hourHours/monthInstancesMonthly Cost
        EKS Cluster0.1120112.00
        Database (t3.xlarge)0.104120112.48
        EC2 (t3.xlarge)0.104120674.88
        Load Balancers (Classic)0.025120412.00
        EBS (general purpose, gp2)0.1/GB142 GB814113.176
        Total FOLIO


        $125.53
        1 Carrier-io (1-year Reserved)
        EC2 (m5.xlarge - Reserved)0.121120114.52
        Spot instance (t3.medium)0.051510.75
        EBS (general purpose, gp2)0.1/GB200 GB1820.00
        Total Carrier-io


        35.27
        Monthly Grand Total$160.80


    2. Each environment costs about $160/month, three environments = $482/month. 
    3. Using 3 years of reserved instances will bring down the costs to $112/month for one environment, 3 environments = $336/month (see attached spreadsheet CommunityPerfEnvironemtCosts.xlsx for more details)

...

  • Create sandbox environments (carrier-io and FOLIO) for teams to play with during the transition.
    • Jira Legacy
      serverSystem JiraJIRA
      serverId01505d01-b853-3c2e-90f1-ee9b165564fc
      keyPERF-114
       - Automate carrier-io installation/launching
    • Jira Legacy
      serverSystem JiraJIRA
      serverId01505d01-b853-3c2e-90f1-ee9b165564fc
      keyPERF-104
       - build large scale FOLIO environment. Need to be repeatable to scale up/down.
  • Jira Legacy
    serverSystem JiraJIRA
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyPERF-102
     - Automatedly running tests checked into Github. Will need to add the step to spin up FOLIO when running tests. This helps to run tests and compare results against the previous test runs.
  • Documentation:
    • Jira Legacy
      serverSystem JiraJIRA
      serverId01505d01-b853-3c2e-90f1-ee9b165564fc
      keyPERF-110
       
    • Upgrading carrier-io going forward
    • Create how-to documentations to administer carrier-io
    • Create a diagram or a set of diagrams showing pieces of carrier-io and of FOLIO to communicate the architecture and responsibilities
    • Performance Analysis documentation: what to look for, log analysis (missing indexes, database logs for slow queries), pgHero, pgAdmin, Performance Insight , metrics, trouble signs (such as slowness - runaway CPU/memory, 500 errors, database memory etc..), Giraffe analysis. 

...