Data import testing




PERF-163 - Getting issue details... STATUS


Summary

mod-data-import has a problems with importing big files (files contains big amount of data). FoliJet team departed from usage of mod-pubsub to use kafka directly. Idea of testing is to prove that mod-data-import able to import big files, find possible issues and/or bottlenecks. In case if mod-data-import will not be able to import files, find the root cause of the issue. 

Goals

  • Final goal: Be able to load 500K records
    • Without crashing and without stopping the system from working (e.g. being able to check out to patrons)
    • With access to the raw JSON logs to see errors, at least
  • Intermediate goal (Iris): Be able to load 100K records (maybe 50K, but prefer 100K)
    • Depends on how long would have to live with intermediate solution?
    • Also need access to raw JSON or some basic log to see errors


Testing script

As a testing tool JMeter will be used to mimic real user behavior (note: JMeter can't simulate asynchronous calls). Testing script will simulate appropriate api calls during data import workflow.

Workflow to simulate:

  • Go to the Data Import app
  • Upload the MARC file
  • Once the file is uploaded, you’ll see a screen with job profiles
  • Click the Create Instances, Holdings, and Items job profile
  • The detail view will open
  • At the top right, click Actions/Run
  • Click Run in the confirmation modal
  • The file import should start
  • Once the file is completely imported, the file name will show at the top of the list on the data import landing page

Job profiles 

There are multiple ways to import data from file called job profiles, for testing we should use few of them:

The easiest on which is create instance 

and one that is more complex than create instance only maybe Create instance & update instance.

The purpose  of using few job profiles is to check how system will behave with different complexity of job profiles.

Test data

As a test data for data import we'll need ".mrc" files. 

As was discussed we should use different set of test data:

  • 50 000 records;
  • 100 000 records;(Intermediate goal)
  • 500 000 records. (Final goal)

Test Results

See test results for various profiles and scenarios here