Data Import is a feature that works on large datasets. To date there are automated tests of various APIs and modules of Data Import including Karate Tests and Cypress Tests, but these tests only importing a handful of records, therefore large-scale imports (of even over 1K records) have not been verified even functionally. Moreover the Data Import profiles that are used in these imports are made-up at best that may not reflect real-life scenarios. Thus there is a need for an automated test system that runs continuously to verify large imports using real-life profiles. This document proposes the requirements for such a system so that it can be built.

Update 3/27 - a Jira ticket is created:

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	RANCHER-1284

Automated Large-Scale Data Import Testing System Features

It is best for Folijet to own this system and not share it with other teams. It may be hosted on Rancher or on the community’s AWS account, or FSE’s AWS account. This way enables Folijet to have full control of the system and be able to troubleshoot issues at any time. This also ensures that the system is dedicated to a single purpose alone and any traffics to it is accounted for easily.

...

The FOLIO instance does not need to be hosted on beefy machines and database. It needs to be able to store thousands of SRS and inventory records upon the jobs' creating these records or for the purpose of doing matchings, finding records in the database that match a particular criteria, on update imports. Therefore, the database may store up to 500K or 1M records but no more than this. Imported data could be truncated weekly.
This FOLIO instance should have 3 tenants so that multi-tenant testing can be done on it.
Datasets can start out with sample/reference data, but when metadata such as profiles are added to it they should be persisted.
This FOLIO instance should run code from latest commits of the Data Import modules that are pulled nightly, built and deployed to the instance. (Ideally each commit from any DI module would trigger a rebuild and deployment of the module to the instance, but this may introduce unforeseen consequences when the build fails or test failures and may require unnecessary attention throughout the day.)
Each FOLIO nightly build should be tagged with the date and test failure reports should also reference this tag so that the test failures are easily traced to the commits that were done on any given date.
The FOLIO instance needs to maintain Data Import profiles found in production or contributed by the community. These profiles may be created/copied manually from production, or auto-generated via the “jpwrangler” tool.
The system should at least stay up for the test run runs or during the work hours of Folijet team, if not 24/7.. Staying up 24/7 would allow ad hoc testing, manual or triggered by a team member to test out something.
Folijet team members should be able to review the basic metrics of the system, such as the containers' CPU and memory utilizations, and DB’s CPU and memory utilizations.
Folijet team members should be able to access the modules' logs in a convenient way.
Folijet team members should be able to configure a module’s environment variables easily.
The system should be able allow a debugger to be attached to enable the team members to troubleshoot any issues conveniently.

...

Kitfox

Creates the FOLIO and automated the test systemsprepares the pipeline to run automated tests.
Periodically maintaining the system, such as truncating old records
Troubleshoot any build deployment issues

...

Versions Compared

Old Version 2

New Version Current

Key

Automated Large-Scale Data Import Testing System Features

Kitfox

Page Comparison

Versions Compared

Old Version 2

New Version Current

Key

Automated Large-Scale Data Import Testing System Features

Kitfox