Troubleshooting external data sources and working with the job info and error logs

The Local KB admin app provides the ability to configure external data source that are then periodically queried for changes to metadata. The external source is thereby kept in sync in the local KB and can act as the basis for agreement lines in the Agreements app. Further details are given the Folio official documentation https://docs.folio.org/docs/erm/local-kb-admin/#connecting-an-external-kb

This document is to help with

  • Troubleshooting when something goes wrong with the data sync process with an external data source
  • Accessing and interpreting the Info and Error logs 
  • Understanding what actions might be taken in relation to the external source data harvesting and what issues different actions can help fix

Structure of external data sources, jobs and processes

External data sources (aka Remote KBs)

The external data source, or "Remote KB", record stores information about the external data source being used to populate the Agreements local knowledgebase. This stores the fundamental information about the data source. The key fields are listed below split up by their purpose (fields that are not used or not relevant to this document are not listed here):

Properties defining the external data source

Domain model property nameDatabase field nameUI property name (en)Notes
namerkb_nameNameA name for the external data source, allocated by the tenant
typerkb_type

Type

Defines which "adapter" (piece of code) can be used to get data from this data source

For GOKb this is always: org.olf.kb.adapters.GOKbOAIAdapter

rectyperkb_rectypeRecord type

Defines whether the external data source is configured to harvest Package information (including titles) or just Title information

In the UI this appears as a choice between Package and Title, but in the database it is stored as a number:

1=Package

2= Title

The usual setup for GOKb is to setup an external data source in Folio with type set to "Package". However it is possible to have GOKb external data source with type set to "Title" if you wish to harvest just the title data and then rely on other sources for Packages

urirkb_uriURI

The URI where the data can be obtained from. I

For GOKb this should either be of the form https://<base URL of the GOKb instance>/gokb/oai/index

trustedSourceTI

rkb_trusted_source_tiTrusted for title instance metadata

When processing title data from the source should this be treated as trusted or authoritative title data

This impacts on how title instances already in the Agreements local KB are updated by data from an external data source (or KBART/JSON file upload). If the source is not a 'trusted' source for title data, title level metadata (such as title, year of publication etc.) will not be updated based on the information from the source. This setting only affects title level metadata and only on updating existing titles. It does not impact on how other data such as package title information is processed or on the initial creation of title instances. Up to mod-agreements 5.5.x (Orchid) title instance IDs (e.g. ISSN or ISBN) are never updated after the title instance has been created.

Properties used for OAI-PMH sources

Domain model property nameDatabase field nameUI property name (en)Notes
cursorrkb_cursorCursorUsed as the "from" parameter 

Properties related to Job management

Domain model property nameDatabase field nameUI property name (en)Notes

lastCheck

rkb_last_checkLast checkedTime stamp (Unix Epoch milliseconds) the external data source was last checked

syncStatus

rkb_sync_statusSynchronization status

The last recorded synchronization status for the external data source. One of:

  • null 
  • idle
  • in-process
activerkb_activeIs activeWhether the external data source should be treated as an active data source currently (true/false)

Jobs

There are a variety of different types of job that can be run in Agreements and these are recorded in a job table in the database. The Local KB Admin UI displays all jobs except those with class "org.olf.general.jobs.ComparisonJob" which are viewable from the ERM Comparisons UI instead.

The "job" records are used by a job runner service which will ensure that each tenant runs only one job at a time and 

Properties defining a job

Domain model property nameDatabase field nameUI property name (en)Notes
classn/aJob type

For the purposes of this document the 'class' is treated as a property on the domain model, even though this is not strictly true. For display in the Local KB Admin UI the classes are given a human readable label which can be translated. The English language translations are given here:

  • org.olf.general.jobs.PackageIngestJob → Harvester
  • org.olf.general.jobs.TitleIngestJob → Title harvester
  • org.olf.general.jobs.PackageImportJob → File import
  • org.olf.general.jobs.KbartIngestJob → KBART Harvester
  • org.olf.general.jobs.KbartImportJob → KBART File import
  • org.olf.general.jobs.IdentifierReassignmentJob → Identifier reassignment
  • org.olf.general.jobs.ResourceRematchJob → Resource rematch
  • org.olf.general.jobs.NaiveMatchKeyAssignmentJob → Naive match key assignment
  • org.olf.general.jobs.ComparisonJob → not displayed in Local KB Admin no translation 
namejob_nameJob nameA name created automatically by the software at job creation, usually of the form "<Description of the job> <timestamp the job was created>"
statusjob_statusRunning status

One of:

  • queued / Queued
  • in_progress / In progress
  • ended / Ended
dateCreatedjob_date_createdn/aTime stamp (Unix Epoch milliseconds) the job was created
startedjob_startedStartedTime stamp (Unix Epoch milliseconds) work started in relation to the job
endedjob_endedEndedTime stamp (Unix Epoch milliseconds) work finished in relation to the job
resultjob_resultResult / Import outcome

One of:

  • failure / Failure
  • interrupted / Interrupted
  • partial_success / Partial success
  • success / Success
runnerIdjob_runner_idn/aUsed to track jobs across module instances so that jobs that 

Properties relating to logging for the job

A number of properties available from the job when requested via the API are "transient" properties that don't have equivalent properties in the underlying database.

Domain model property nameDatabase field nameUI property name (en)Notes
workn/an/a
errorLogn/aError logA log 
infoLogn/aInfo log
errorLogCountn/aErrors / Badge on Error log accordion
infoLogCountn/aBadge on Info log accordion
fullLogn/an/a
fullLogCountn/an/a

How an external data source harvest is run:

Every hour a scheduled check for each tenant (in case this is being run in a multi-tenant environment). This check on a tenant by tenant basis has several parts to it. The first part is to check the current job queue for a tenant to see if there are existing Title Ingest jobs as follows:

  1. Check to see if there is at least one Title Ingest job (i.e. a job that  that meets any of the following criteria
    1. status = queued
    2. status = in_progress
    3. With a date created within the last day
  2. If there is at least one Title Ingest job in any of these categories then no new title ingest job will be created for that tenant and a message will be logged (Log level: INFO) "Title harvester already running or scheduled. Ignore.".
  3. If there are no title ingest jobs meeting at least one of the criteria, a new Title Ingest job will be created with the name "Scheduled Title Ingest Job <current time timestamp>" - note at this stage it is not known if any work will be done as part of this job, the job is created whether it results in any actual work or not

Secondly the job queue for the tenant is checked to see if there are existing Package ingest jobs as follows:

  1. Check to see if there is at least one Package Ingest job that meets any of the following criteria
    1. status = queued
    2. status = in_progress
  2. If there is at least one Package Ingest job in either of these categories then no new package ingest job will be created for that tenant and a message will be logged (Log level: INFO) "Package harvester already running or scheduled. Ignore.".
  3. If there are no package ingest jobs meeting at least one of the criteria, a new Package Ingest job will be created with the name "Scheduled Package Ingest Job <current time timestamp>" - note at this stage it is not known if any work will be done as part of this job, the job is created whether it results in any actual work or not

Because both these steps include checking the existing job queue, the outcome will change as the content of the job queue changes - which can happen automatically as time passes and jobs run to completion, or can happen manually by a user deleting jobs from the queue via the UI or the jobs being manipulated directly via the API or in the database)


done by looking for external data source records with

    • active == true
    • type != null
    • rectype == Package
    • syncStatus != 'in-process' (can be any other value or null)
    • lastCheck is null OR more than 1 hour (specifically 3,600,000 milliseconds) before the current time
    • name != 'LOCAL' ("LOCAL" is the name of a default dummy external data source used for packages that are loaded from KBART or JSON files directly into the system)
  • Queue a job, set sync status to in-process (or does this happen when the job starts to be processed?)
  • When job is completed set sync status back to 'idle'

Types of problem:

  • Harvest job is stuck in an "In progress" state
  • Harvest job is stuck in a "Queued" state
  • Data missing from harvest

Job statuses their meaning

  • Running status
    • Queued
    • In progress
    • Ended
  • Import outcome
    • Success
    • Partial success
    • Failure

Interpreting the logs

  • Info log
    • GOKb
      • What messages to expect
      • What these messages mean
    • JSON import
    • KBART import
  • Error log
    • GOKb
      • What messages to expect
      • What these messages mean
    • JSON import
    • KBART import