Implement Title Ingest process
Description
defines
relates to
Checklist
hideTestRail: Results
Activity

Ethan Freestone August 19, 2021 at 8:43 AMEdited
Other notes:
Currently GoKB Title stream does not populate "medium" field as I would expect. It seems to be "Journal" or "Book" rather than electronic vs print. For now the adapter assumes all titles on title stream are electronic and passes them to the resolver to deal with multiple identifiers and sibling creation etc.
Title Ingest is very large and leaves many orphaned titles.
Test run sitting at 43,743 Titles (For reference Package Ingest alone seems to pull in 11,139).
System runs a little slower with this volume of titles
Title Ingest Job started 3:21PM and ended 4:43PM, so took 1hr 22m
Package Ingest Job then started at 4:43PM and was still running next morning at 9:00AM, although it looks like it isn't actively bringing in any data, so may have just gotten jammed, potentially from lack of system memory or something.

Ethan Freestone August 16, 2021 at 2:38 PMEdited
Questions and uncertainties:
Do we need to implement a frontend way to do this like a Package Import job, or is that out of scope? If not, will it eventually be needed?
A: Treated as out of scope, but we should be able to create a TitleImportJob fairly easily to piggyback on this work
What does the title ingest service itself need to accomplish?
Upsert title - DONE
Secondary enrichment (POSSIBLY NOT, could use GOKB as Title ingest source and then secondary enrichment call would be useless. Think it's unavoidable if you're using GoKB as title AND package then you run the secondary call twice)
A: Decided to do this as part of title, as "enrichment" uses information about our internal representation to decide which fields to fill. To avoid DRY this happens as part of TitleIngest which the package ingest simply calls.
Do we need Platform information attaching to Titles?
Do we want package ingest to use the title ingest to avoid DRY or do we need them to be separate?
A: As things stand the package ingest calls the title ingest service per title, which handles the resolver and the secondary enrichment if necessary
-Adapter needs - do we need another shape for a "title adapter"?-
A: No. This is another use for a single adapter, as one source could be capable of both Title and Package streams (See: GoKB). Added methods to adapter interface to support title ingest alongside package ingest
New KB creations for title ingests
LOCAL_TITLE for imports? - Decided this is out of scope
GoKB_TITLE? - This is necessary, but may not need to be bootstrapped
Details
Details
Components
Assignee

Reporter

Currently we do not support ingesting title information separately to package information. This story is to support title ingest separately to allow us to use alternative sources of title information (potentially better sources) even if they do not describe packages.
Implement the ability for a remote KB to be a source of Title information (currently there is a rectype which is always Package)
A first PoC could be done with GOKb Title OAI (https://gokbt.gbv.de/gokb/oai/titles?verb=ListRecords&metadataPrefix=gokb), however the zdb:online set from the DNB (https://www.dnb.de/EN/Professionell/Metadatendienste/Datenbezug/OAI/oai_node.html) may provide a better title list ultimately.
Title Instance matching should use the existing TitleInstanceResolverService (as is used by package ingest)
Title Ingest should run once a day and title ingest jobs should run before package ingest jobs