Okay
This is a summary of experiences using FOLIO Data Import to do MARC record test loads through the data import ui and cli as of early June 2019. At this point, the cli is available in release 2.1; the ability to use the data import UI to do test loads is available via via https://folio-testingsnapshot-load.aws.indexdata.com/. This environment should be used instead of folio-testing or folio-snapshot or folio-snapshot-stable, especially when loading files with more that 10 or 20 records, so as not to compromised the performance of the other hosted reference environments.
Data Import UI
Instructions and explanations for using the data import UI to test loading of MARC records can be found in the Data Import Temporary MARC Bib Load Button powerpoint presentation created by Ann-Marie Breaux, the data import product owner. If you wish to test loading MARC bib records through the UI, follow these steps:
- Connect to https://folio-snapshot-testingload.aws.indexdata.com/ (userid/password: diku_admin/admin)
...
Here are some sample files to show what the formatted records should look like:
Performance considerations if attempting a large file load via the CLI:
The modules which ingest and process need to be given a greater amount of Java heap memory resources than the Index Data default of “-Xmx256m” - as what’s generally set in the hosted "testing" and "snapshot" environments.To avoid crashing the modules in a production-ready Folio system during a record load of 50k+, it was necessary to set the Java heap memory to “-Xmx4096m” for both mod-source-record manager and mod-source-record-storage. It was also useful to set container limits, so the load does not run-away on the system, causing the entire Folio deployment to become unresponsive by failing a host/node. Texas A&M's Folio Q2.1 2019 instance is running on a K8s/Rancher cluster, hosting in total three Folio deployments, as well as a module descriptor registry deployment. Each of Texas A&M's 8 nodes has a 4-core CPU and 16GB of memory. Its Okapi and Folio Module Postgres databases are separated - to avoid UI failures of requests during heavy data loading.
Experiences
Log of various test loads: record_update_testing_log.xlsx