Data Import is a feature that works on large datasets. To date there are automated tests of various APIs and modules of Data Import including Karate Tests and Cypress Tests, but these tests only importing a handful of records, therefore large-scale imports (of even over 1K records) have not been verified even functionally. Moreover the Data Import profiles that are used in these imports are made-up at best that may not reflect real-life scenarios. Thus there is a need for an automated test system that runs continuously to verify large imports using real-life profiles. This document proposes the requirements for such a system so that it can be built.

Update 3/27 - a Jira ticket is created:

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	RANCHER-1284

Automated Large-Scale Data Import Testing System Features

...

The FOLIO instance does not need to be hosted on beefy machines and database. It needs to be able to store thousands of SRS and inventory records upon the jobs' creating these records or for the purpose of doing matchings, finding records in the database that match a particular criteria, on update imports. Therefore, the database may store up to 500K or 1M records but no more than this. Imported data could be truncated weekly.
This FOLIO instance should have 3 tenants so that multi-tenant testing can be done on it.
Datasets can start out with sample/reference data, but when metadata such as profiles are added to it they should be persisted.
This FOLIO instance should run code from latest commits of the Data Import modules that are pulled nightly, built and deployed to the instance. (Ideally each commit from any DI module would trigger a rebuild and deployment of the module to the instance, but this may introduce unforeseen consequences when the build fails or test failures and may require unnecessary attention throughout the day.)
Each FOLIO nightly build should be tagged with the date and test failure reports should also reference this tag so that the test failures are easily traced to the commits that were done on any given date.
The FOLIO instance needs to maintain Data Import profiles found in production or contributed by the community. These profiles may be created/copied manually from production, or auto-generated via the “jpwrangler” tool.
The system should at least stay up for the test run if not 24/7.. Staying up 24/7 would allow ad hoc testing, manual or triggered by a team member to test out something.
Folijet team members should be able to review the basic metrics of the system, such as the containers' CPU and memory utilizations, and DB’s CPU and memory utilizations.
Folijet team members should be able to access the modules' logs in a convenient way.
Folijet team members should be able to configure a module’s environment variables easily.
The system should be able allow a debugger to be attached to enable the team members to troubleshoot any issues conveniently.

...

Versions Compared

Old Version 2

New Version 3

Key

Automated Large-Scale Data Import Testing System Features

Page Comparison

Versions Compared

Old Version 2

New Version 3

Key

Automated Large-Scale Data Import Testing System Features