2017-07-07 FOLIO Data Model/Codex Working Group Meeting notes

Date

Attendees

Agenda

  • 8:30 - 8:45 - Settling in and quick introductions
  • 8:45 - 9:45 - Data model and architecture (Vince Bareau)
  • 9:45 - 10:30 - Metadata analysis part 1 (Kathryn Harnish)
  • 10:30 - 10:40 - Break
  • 10:40 - 11:10 - Metadata analysis part 2 (Kathryn Harnish)
  • 11:10 - 12:10 - Wireframes (Cate Boerema)
  • 12:10 - 12:30 - Buffer for discussion overrun and/or additional topics

Documents

We have a google drive folder for this group which contains today’s deck, member list etc. Look for wireframes there.

https://drive.google.com/drive/folders/0B7-0x1EqPZQKN21rWVMwX0pDR1E

Notes

Data model and architecture

Presentation is available here: https://drive.google.com/open?id=1KGlSSa0ajY9XcXb4QWD5_A8TxMVZWjthObRjw6seVnw

In this presentation, Vince reviewed the purpose of the codex and the types of codex data objects included in the current model: instance, holdings/item, coverage, location, and package. He also talked about the microservices approach, which means that each application (or domain) within FOLIO can have its own version of the instance with unique data elements. These instances can be linked across domains. The data stored on the central codex record will be very minimal, with most of the richer data being stored in specific domains.

Questions/Disussion:

 

Will Codex records referencing many external records/foreign keys (in FOLIO or external KBs)?

  • Sort of — each domain is actually creating codex records — so it’s more like the domain records are referencing the codex
  • If you create instances in multiple domains, there will actually be multiple codex records that are linked together
  • This model is distributed — domains reference the same IDs, but can define and store their own specific content
  • Like BF, FOLIO will use pieces of different sources to aggregate up to the complete view of the thing

How do you prevent conflicting data across domains?

  • In some cases you may have differing data across domains
  • KB may help with this by providing a trusted source of metadata
  • Have the ability to link up to those metadata values between different domains

How to you do reporting and queries across domains?

  • Reporting might be  a separate domain 
  • Could look across different areas and reconcile
  • For reporting, it will be hard to rely on microservices
  • Separate reporting database or data warehouse seems like the most likely solution

How will data be stored? What does denormalized mean?

  • Codex is really intended to be an abstraction layer
  • Implementation behind the scenes is up to each domain
  • Expecting a lot of different interpretations
  • Local storage may be in a relational database
  • Codex for KB may be more of a passthrough — in that case database and storage models are up to developers of each domain
  • Relational databases are normalized and have tables that representseach element (title, author, etc.)
  • When you want to produce an output or record that represents something like a book, you have to join against multiple tables
  • In denormalized environment, you have a separate table that is created for the resource you’re describing
  • Codex will be using JSON and will be flat

For Instance/Item relationships, will an instance always have the same item records — or will it have different items in certain domains?

  • We think in general it will be the same set of records
  • Will just have different metadata
  • Each microservice might keep track of subsets of data that relate to an item 

What is the logic behind the BF-like approach?

  • We like the pragmatic approach to work, instance, item mapping 
  • Fits well with current models
  • On negative side, BF is still in flux, use within libraries is still in flux
  • Other problem is that BF won’t be the only model
  • Our choices are related to the notion that MARC will continue to be around and need to be supported
  • Goal to allow libraries to stay with MARC, move to BF, or do some of both

How do we determine what details of a resource cause you to differentiate between different instances or items?

  • Instance
    • Mean to represent something relatively narrow
    • Has a resource type 
    • Ebook, print book, audio book would be three different instances
    • Kind of mashes together FRBR manifestation and express
    • Discovery layer might be the place where different instances are rolled up together
  • Work
    • May not be used initially
    • Would be a way for libraries to pull together instances
    • Gives you a handle to find different related instances
  • Items
    • Copies would be represented as different items
    • You would circulate at the item level
    • Items in circulation would relate to items in the codex
    • Whether circ needs to reference an instance that an item is related to, is unclear right now
    • Circ functions at the item level most of the time but you still need access to the instance level data
  • Package 
    • Not to be confused with a vendor package
    • Simply a way to cluster things together
    • Won’t play a critical role in circ

Metadata analysis

Kathryn Harnish, can you post your slides here or link from Google Drive?

Kathryn presented an overview of the initial approach to defining metadata elements for codex records and mapping that data from another source. At the beginning phases, this work has been limited the print monographs use case and a MARC environment. E-resources and other metadata schema will be addressed in separate conversations. Kathryn presented two use cases: searching for a title and creating a purchase order for a print monograph. For each of these, she suggested data elements that would be needed to complete the task. She then walked through a more detailed mapping of proposed codex fields and mappings from MARC to the codex.

Questions/Discussion

 

Since holdings/item are one record, if you have a run of bound journal volumes, will the holding info be repeated in each record?
  • Yes, the holding info would be repeated
  • Will see advantages on technical side — flattened structure
  • From a functional perspective, can still see things at the appropriate level
Is there a reason to have a separate holding record?
  • There may be 
  • Will depend on mapping work and needs for SIGs
Getting data back out is another important point — needed for collaborative collecting
  • Distinction is that you can still maintain the MARC or other standard structure if that’s appropriate to you
  • From the Codex perspective, we’ve abstracting info up and flattening it out
  • At a codex level, the data is available and it may not need to be represented in the same way as the source
  • How data is stored may not be how it’s managed
  • Efficiencies come in how you manage, not how you store
  • Recommendation to ID fields and look at management use cases to determine structure
Do I have to maintain previous metadata format? How do we enforce it if we do want to maintain that previous structure?
  • No, you won’t be required to stay with the source structure
  • Can map data into something else
  • Enforcement of previous standard would be more related to metadata management, rather than codex
For print serials, where will you attach your purchase order?
  • Don’t want to attach to a specific item — PO is for all items
  • How much complexity do we need? 
  • Container record could be a solution here
  • Or attach it to instance
  • In Acq domain, need to model the thing that you’re purchasing
  • But that can be defined in the acq microservice
Important to know subscribed/unsubscribed
  • Kathryn will show some different definitions of things like coverage dates, subscription dates, etc. 
  • Will need to do some work on how to define “currently held”
  • Does it need to be explicit in the codex or can it be inferred?
  • In the model of Codex as pass through, subscription information could be provided by KB or other external system
  • But if you need to search or filter on this, then we’ll have to have something in the codex
  • Also can’t assume that other micro services exist — library might not be using acquisitions
Role of Codex
  • It’s there to help you navigate and help you recognize things
  • Doesn’t give you detailed answers
  • Basically just tells you - do I hold this or not
  • If you need to know more detail, that's beyond the scope of codex
  • If you have a item ID, but no holdings ID, then you don’ have a holding — can be used to filter out items you don’t hold
  • Essentially, the codex is “things I know about"
  • The presence of a holding tells you whether those things are part of your library’s collection/inventory
How do we get info out of the system — e.g. data delivery for shared ILL?
  • Could use reporting
  • Expectation is that data deliveries for consortium or other would probably come out of the codex 
  • Through codex can get at other info like source metadata, inventory etc.
  • Also have to support live look-ups — e.g., Relay D2D
  • This would be built as a plug ins or app that would sit on top of the codex
Suppressed fields
  • Could this be done on the front end (domain?) rather than Codex? 
  • Might be an implementation decision
What causes separate records?
  • E.g., hardcover and paperback ISBNs — are they two records?
  • If we want to track format field, they would be different records
  • Comes down to whether we’d want to distinguish them
How minimal is minimal?
  • Keep it at small a possible, knowing we can always get details somewhere else
  • Part of what we’re seeing now is evolution in thinking and understanding
  • Functional people won’t make this determination — our job is to define what we need to support tasks
  • Implementation decisions will be driven by architecture
  • That’s OK as long as use cases can be addressed

Wireframes

Wireframes available here: https://drive.google.com/drive/folders/0B18Bhhmr94zaSlI3M3RnN042UFE

Cate presented a series of wireframes that demonstrated the results of a codex search. She also showed how users would be able to drill down into each search result, displaying additional information and potentially linking into other apps. 

Questions/Discussion

 

What is the primary use case for this search?
  • Are there separate searches/workflows for other domains — e.g. orders, users, requests?
  • Yes, searching for domain-specific record types would be done in dedicated apps
  • The use case for this search is identifying an instance, item, or package
  • Essentially asking: is this something I hold?
  • It’s a bit like a KB search
  • We began referring to this as the “inventory search"
Would you ever want to filter by items?
  • Maybe not
  • If searching by barcode, you would see just instance result that contains that item
  • We are thinking we’ll remove the item filters from this UI
  • It’s still important to be able to filter by electronic vs physical results
  • Restricting inventory search scope to codex might be good for v1
Is there interest in a truly global search?
  • This would be a search that allows you to search across all apps and see a mix of results — instances, users, orders, etc.
  • It would be nice to be able to start with general search, so that you don’t have to make a decision about which search to use
  • This would be the universal search to locate objects in system — if you want to edit or create, you would need to go into something different
  • Prototype from the beginning has had recommendation for universal search across all domains
  • Has been hard to engage deeply so far
  • Universal search function wouldn’t necessarily be an app, but a layer over everything
  • Universal search is probably something we want long term
  • Would be very complex in terms of understanding the user and what they have rights to view/edit each record type
  • Analogy is Mac search/magnifying glass in top corner
  • Bento box style could be one approach to show results
  • Might also be able to set personal defaults to return only certain record types
If you are going have a inventory search, do you want to edit in same place too?
  • This search makes sense as a way to get to instance or item you’re looking for
  • But it might make more sense to create/edit within the specific domain app
  • If you have links to other modules, you don’t need to recreate editing/creating screens within universal search
  • Tabbed view might be helpful to navigate between different apps
  • Inventory search results are navigational, summary results
  • When you select something and look at detail, you are essentially going into another domain to see the details
  • Distinction is very much dependent on when you transition from one domain to another
If linking between smaller apps, would that make the navigation easier?
  • Doesn’t necessarily solve the problems
  • May alleviate the onion effect — many layers of records being opened over top of each other
  • Might be able to let apps deliver the edit view as overlays within the main app
We don’t have to get everything completely right from the get-go
  • Codex will not be cast in stone
  • If someone doesn’t like universal search, they don’t have to use it — there will be other searches/apps
  • It’s also possible someone could build a different codex in future

Next steps

  • We will use the Metadata Management SIG's meeting time slot for future codex discussions (Thursdays at noon Eastern time)
  • We will aim to complete an initial codex model that can be implemented by the end of July
  • Highest priority at this point is to validate the existing work on the codex against use cases
  • Kathryn will develop an agenda for next week's meeting to work on this task
  • If additional time is needed to look at new wireframes, we will schedule an ad hoc meeting
  • The RM SIG will continue to work on development of the acquisitions app concurrently

Action items

  •