Implement Title Ingest process

Description

Currently we do not support ingesting title information separately to package information. This story is to support title ingest separately to allow us to use alternative sources of title information (potentially better sources) even if they do not describe packages.

Implement the ability for a remote KB to be a source of Title information (currently there is a rectype which is always Package)

A first PoC could be done with GOKb Title OAI (https://gokbt.gbv.de/gokb/oai/titles?verb=ListRecords&metadataPrefix=gokb), however the zdb:online set from the DNB (https://www.dnb.de/EN/Professionell/Metadatendienste/Datenbezug/OAI/oai_node.html) may provide a better title list ultimately.

Title Instance matching should use the existing TitleInstanceResolverService (as is used by package ingest)

Title Ingest should run once a day and title ingest jobs should run before package ingest jobs

Checklist

hide

TestRail: Results

Activity

Show:

Ethan Freestone August 19, 2021 at 8:43 AM
Edited

Other notes:

 

Currently GoKB Title stream does not populate "medium" field as I would expect. It seems to be "Journal" or "Book" rather than electronic vs print. For now the adapter assumes all titles on title stream are electronic and passes them to the resolver to deal with multiple identifiers and sibling creation etc.

 

 

Title Ingest is very large and leaves many orphaned titles.

Test run sitting at 43,743  Titles (For reference Package Ingest alone seems to pull in 11,139).

System runs a little slower with this volume of titles

Title Ingest Job started 3:21PM and ended 4:43PM, so took 1hr 22m

Package Ingest Job then started at 4:43PM and was still running next morning at 9:00AM, although it looks like it isn't actively bringing in any data, so may have just gotten jammed, potentially from lack of system memory or something.

Ethan Freestone August 16, 2021 at 2:38 PM
Edited

Questions and uncertainties:

  • Do we need to implement a frontend way to do this like a Package Import job, or is that out of scope? If not, will it eventually be needed?

  • A: Treated as out of scope, but we should be able to create a TitleImportJob fairly easily to piggyback on this work

 

  • What does the title ingest service itself need to accomplish?

    • Upsert title - DONE
       

    • Secondary enrichment (POSSIBLY NOT, could use GOKB as Title ingest source and then secondary enrichment call would be useless. Think it's unavoidable if you're using GoKB as title AND package then you run the secondary call twice)

    • A: Decided to do this as part of title, as "enrichment" uses information about our internal representation to decide which fields to fill. To avoid DRY this happens as part of TitleIngest which the package ingest simply calls.

  •  

    • Do we need Platform information attaching to Titles?

 

 

  • Do we want package ingest to use the title ingest to avoid DRY or do we need them to be separate?

  • A: As things stand the package ingest calls the title ingest service per title, which handles the resolver and the secondary enrichment if necessary

 

  • -Adapter needs - do we need another shape for a "title adapter"?-

  • A: No. This is another use for a single adapter, as one source could be capable of both Title and Package streams (See: GoKB). Added methods to adapter interface to support title ingest alongside package ingest

  • New KB creations for title ingests

    • LOCAL_TITLE for imports? - Decided this is out of scope

    • GoKB_TITLE? - This is necessary, but may not need to be bootstrapped

Done

Details

Components

Assignee

Reporter

Priority

Sprint

Development Team

Bienenvolk

Release

R3 2021

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs
Created July 30, 2021 at 11:09 PM
Updated October 27, 2021 at 1:57 PM
Resolved September 24, 2021 at 4:21 PM
TestRail: Cases
TestRail: Runs