Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Environment

  • Use the default UChicago dataset - 27M records
  • Other datasets and their sizes: Check with P.Os, depends on the workflow to test.
  • Run two environments - 1 with profiler and the other one withOUT profiler.

...

  • Establishing test scenarios and conditions, SLA with POs, especially for the scenarios that we come up with.
  • Maintaining a test log - write down time of tests execution and conditions
    • parameters:
      • dataset name or the number of records in the database
      • log level of all modules and/or a specific module
      • FOLIO version and/or specific modules versions
      • With or Without profiler
      • Number of users
      • Duration
      • Other configurations or settings (TBD)
  • Feasible to restart the cluster so that all the ECS services have a fresh starting point in terms of CPU and memory?
    • Short duration tests, no need to restart environment every time
      • Keep an eye on env's metrics such as CPU and memory utilization, may need to take proactive action to restart the module or the whole env if the metrics reach a critical level. 
    • Long duration tests, need to restart environment to have a clean starting point. 
  • Baseline tests/results:
    • Only when absolutely required? E.g., whole new set of workflow
    • Each time adding new version of module 
    • If parameters haven't changed, then don't need to rerun baseline. 
  • pgHero is a tool that captures slow queries. Clear out pgHero if it has not been cleared already.
  • Run a smoke test to verify that there are no functional errors or that the environment has been set up successfully
  • Longevity tests
    • Take a heap dump
  • Triple check the Jenkins job's parameterparameters
  • If environment has been restarted, make sure that all ECS services are stable for at least 15 minutes

During test:

  • Capture any observations. 
  • Capturing heap dumps, esp. for longevity tests - manual process at the moment, maybe automated in the future. At a minimum: beginning, middle, and end of test run.

Post Test:

  • Data collection - what pieces of data are important to collect
    • Average response time (Obtained from Grafana)
    • Errors (thresholds for failing an API call (Obtained from Grafana)) 
    • Modules logs to see if any errors entries
    • TPS - transactions per seconds
    • CPU utilization for a particular module or for any abnormal behavior observed (from any module)
    • Memory usage for a particular module or for any abnormal behavior observed (from any module)
  • Updating test log
    • With observations above, any anomalies
    • Update the timestamps or a Grafana URL so that we can go back and look at the graphs later
  • Capture PgHero stats
    • Save-As from browser
  • If running a series of tests, don't wait until the end of the runs to look at the data. Check the data after each run to make sure nothing is questionable and if there were, we can address them right away.
  • Write report