Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Since R1 2021 Iris release Since R1 2021 Iris release Data Import application is using KAFKA as a transport

...

Kafka disc space was set to 500 Gb (disk space was not highly utilized, it depends on log retention and load).

Modules memory configs in Mb:

...

KAFKA_HOST andKAFKA_PORT values should also be specified for mod-inventory-storage. mod-inventory-storage also requires REPLICATION_FACTOR value to be set https://github.com/folio-org/mod-inventory-storage/blob/master/README.MD#kafka.

After setup, it is good to check logs in all related modules for errors. Data import consumers and producers work in separate verticles that are set up in RMB's InitAPI for each module. That would be the first place to check deploy/install logs.

DB_MAXPOOLSIZE should be set not less than 15(we recommend to set as 15) for modules mod-source-record-manager and mod-source-record-storage.

There are other properties that should be set for data import modules:

mod-data-import

Properties related to file upload that should be set at mod-configuration are described in the doc  https://github.com/folio-org/mod-data-import#module-properties-to-set-up-at-mod-configuration

System property that can be adjustedDefault value
file.processing.marc.raw.buffer.chunk.size

50

file.processing.marc.json.buffer.chunk.size50
file.processing.marc.xml.buffer.chunk.size10
file.processing.edifact.buffer.chunk.size10

...

If topics are created manually, make sure topics for all data import event types. See the list of event types. Topics in Kafka should have name built from different pieces: ENV, nameSpace, tenant, eventType. Data import related event types will always have the DI prefix. Currently "Default" nameSpace is hardcoded for all the topics.

If auto.create.topics.enable=true setting is set for MSK topics will be created automatically. Please note that in such case the first data import job run after the set up will take longer to complete. 

We strongly recommend to make topic's partitions count the same for all topics, especially for topic DI_RAW_RECORDS_CHUNK_READ and DI_COMPLETED, DI_ERROR, because it is the basis of proper work of Flow Control feature for load orchestrating.

Delete job executions with all related data

...

  • periodic.job.execution.permanent.delete.interval.ms - interval in milliseconds to trigger job for hard deletion.
    (By default it equals to 86400000 that is the same as 24 hours).
    Example of applying this property in JAVA_OPTS:  -Dperiodic.job.execution.permanent.delete.interval.ms=86400000
  •  job.execution.difference.number.of.days - number of days from job execution completed date to consider that job execution eligible for deletion.
    (By default it equals to 2 days).
    Example of applying this property in JAVA_OPTS: -Djob.execution.difference.number.of.days=2

...

Troubleshooting for System Administrators

How to restart DI application

  • Kill the job that appears to be stuck (click the trash can in the right corner and wait for 10 sec)
  • Stop modules involved in Data import process (mod-data-import, mod-source-record-manager, mod-source-record-storage, mod-inventory, mod-invoice)
  • Delete topics in Kafka related to data import (such topics follow the pattern "ENV.namespace.tenantId.DI_eventType"). Note that all the topics related to data import has DI prefix for the event type name. This will delete all the records that were sent to Kafka but wasn't delivered to the consumer.
  • Applicable only if auto.create.topics.enable=true is not set - Recreate topics that were deleted (OR skip the previous step and clear the records from the topics - to do so set retention to 1 ms and wait for a couple of minutes, then set normal retention time)
  • Restart modules involved in data import process (mod-data-import, mod-source-record-manager, mod-source-record-storage, mod-inventory, mod-invoice). In case auto.create.topics.enable=true is set all the necessary topics will be created automatically.
  • Run data import job to make sure it is working

How to deal with DB schema migration issues

In case if system administrators manually invoke POST Tenant API to database and not specify 'module_from'  version to migrate - this can lead to some DB migration issues.

Exception that was reproducible on some environments when incorrectly using POST Tenant API endpoint to perform DB migration:

Info

13:38:36.596 [vert.x-eventloop-thread-1] ERROR PostgresClient [55120eqId] ERROR: column jep1.jobexecutionid does not exist (42703)
io.vertx.pgclient.PgException: ERROR: column jep1.jobexecutionid does not exist (42703) 


POST Tenant API request leads to this issue. In this request 'module_from' param is missed, that is incorrect usage of this endpoint.

Code Block
languagejs
curl "http://localhost:8081/_/tenant" -H "X-Okapi-Tenant: diku" -H "Content-type: application/json" -XPOST -d'
{
  "module_to": "mod-source-record-manager-3.3.0",
  "parameters": [{
    "key": "loadSample",
    "value": "true"
  },{
    "key": "loadReference",
    "value": "true"
  }]
}'


Correct usage of POST Tenant API request. In this request we specifying both 'module_from' and 'module_to' parameters, that is the same as Okapi does when invoking this endpoint.

Code Block
languagejs
curl "http://localhost:8081/_/tenant" -H "X-Okapi-Tenant: diku" -H "Content-type: application/json" -XPOST -d'
{
  "module_from": "mod-source-record-manager-3.2.0",
  "module_to": "mod-source-record-manager-3.3.0",
  "parameters": [{
    "key": "loadSample",
    "value": "true"
  },{
    "key": "loadReference",
    "value": "true"
  }]
}'