Overview

Data Import is a feature that works on large datasets. To date there are automated tests of various APIs and modules of Data Import including Karate Tests and Cypress Tests, but these tests only importing a handful of records, therefore large-scale imports (of even over 1K records) have not been verified even functionally. Moreover the Data Import profiles that are used in these imports are made-up at best that may not reflect real-life scenarios. Thus there is a need for an automated test system that runs continuously to verify large imports using real-life profiles. This document proposes the requirements for such a system so that it can be built.

Update 3/27 - a Jira ticket is created: RANCHER-1284: Create Automated Large Scale Data Import Test System (for Folijet)Blocked

Automated Large-Scale Data Import Testing System Features

It is best for Folijet to own this system and not share it with other teams. It may be hosted on Rancher or on the community’s AWS account. This enables Folijet to have full control of the system and be able to troubleshoot issues at any time. This also ensures that the system is dedicated to a single purpose alone and any traffics to it is accounted for easily.

FOLIO Instance

The FOLIO instance does not need to be hosted on beefy machines and database. It needs to be able to store thousands of SRS and inventory records upon the jobs' creating these records or for the purpose of doing matchings, finding records in the database that match a particular criteria, on update imports. Therefore, the database may store up to 500K or 1M records but no more than this. Imported data could be truncated weekly.
This FOLIO instance should have 3 tenants so that multi-tenant testing can be done on it.
Datasets can start out with sample/reference data, but when metadata such as profiles are added to it they should be persisted.
This FOLIO instance should run code from latest commits of the Data Import modules that are pulled nightly, built and deployed to the instance. (Ideally each commit from any DI module would trigger a rebuild and deployment of the module to the instance, but this may introduce unforeseen consequences when the build fails or test failures and may require unnecessary attention throughout the day.)
Each FOLIO nightly build should be tagged with the date and test failure reports should also reference this tag so that the test failures are easily traced to the commits that were done on any given date.
The FOLIO instance needs to maintain Data Import profiles found in production or contributed by the community. These profiles may be created/copied manually from production, or auto-generated via the “jpwrangler” tool.
The system should at least stay up for the test runs or during the work hours of Folijet team, if not 24/7.. Staying up 24/7 would allow ad hoc testing, manual or triggered by a team member to test out something.
Folijet team members should be able to review the basic metrics of the system, such as the containers' CPU and memory utilizations, and DB’s CPU and memory utilizations.
Folijet team members should be able to access the modules' logs in a convenient way.
Folijet team members should be able to configure a module’s environment variables easily.
The system should allow a debugger to be attached to enable the team members to troubleshoot any issues conveniently.

Test Infrastructure

May leverage the existing automated Karate test infrastructure/framework to execute the tests.
May leverage existing Karate Integration Test suite of Data Import
May leverage the existing JMeter script that the PTF created.
The test system should run the tests nightly at least once a day and produce a report after the test run.
Tests should be designed to run with different profiles and input MARC files.
Expose test result logs for troubleshooting.
New tests should be easily added to the system, old tests should be easily modified.
Team should be able to access the database, e.g., pointing a PgAdmin instance to it.

Responsibilities

Kitfox

Creates the FOLIO and prepares the pipeline to run automated tests.
Periodically maintaining the system, such as truncating old records
Troubleshoot any build deployment issues

Folijet

Team members and/or QA team contribute the tests and corresponding MARC files and profiles. MARC files should have no more than 25K records, but at a minimum 10K records.
Team members troubleshoot any test failure on demand the day of the tests failing.

Folio Development Teams

Folijet - Automated Large-Scale Import Testing (Proposal)