2018-02-26 - Data Migration Subgroup Notes

Date

Zoom Connect Information

Topic: Data Migration Subgroup

Time: Feb 26, 2018 11:00 AM Eastern Time (US and Canada)

    Please download and import the following iCalendar (.ics) files to your calendar system.

    Weekly: https://zoom.us/meeting/276260561/ics?icsToken=7a985191c2234bd670b45e98a048b9e934791fc63f5d73a99b35c234efaae7ba

Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/276260561 https://zoom.us/j/276260561

Attendees

Goals

Discussion items

TimeItemWhoNotes
5 minWelcome, identification of note taker and next week's convener

Anne L. Highsmith will take notes, Dale Arntson will convene next week.


Continue discussion of data mapping spreadsheetsAnne L. Highsmith

Anne Highsmith's spreadsheets are in: https://drive.google.com/open?id=1XAz6VByU4OArXOJvySrAb_XonzUdsJIL

Uschi Klute's spreadsheets with fields for patron and circulation data in LBS (Library System used in GBV): hhttps://drive.google.com/open?id=19dsjIX0LUXSo9f87v23rbdYVG_ttb_CV

Theodor Tolstoy spreadsheets of data from Sierra: https://drive.google.com/open?id=1MVguL-6-9c0oux7FDWtzAvfpTFDrZCoT

Theodor Tolstoy (One-Group.se)'s Github repo with example data and links to documentation: https://github.com/fontanka16/SierraToFolioConverter

  • PWanninger (EBSCO) introduced her co-worker, Alexander Soto (resident in Santiago, Chile), who will be working with UNAM on their implementation.
  • Uschi Klute said she had added another data mapping for LBS to the folder linked above.
    https://drive.google.com/open?id=19dsjIX0LUXSo9f87v23rbdYVG_ttb_CV
  • AHighsmith demoed the voyager_item_to_folio_item.tsv spreadsheet, asking for comments on format and decision points.
    • PWanninger pointed out some fields, such as price, might map to other records, such as acquisitions. Patty also asked why item_type.? was identified with a question mark in the source field; Highsmith responded that the question correlated with the comment and indicated in this case that the tenant had to make a decision on which of 3 data elements from the item_type table should be used.
    • Long discussion ensued about how to handle data which exists in source records (regardless of source system) that doesn't yet have an obvious destination in a FOLIO record.
      • Which SIG would be responsible for commenting on "missing" item data? It would at least be a combination of Metadata Management SIG, which may have already covered these issues in working groups last fall and Resource Access, depending on data element. (
      • CManly asked what the best way was to consolidate lists of "missing" elements across various systems? Should this group come up with list(s) of consolidated data elements for each record type across a variety of systems and then take those lists to the appropriate SIG? DArntson said perhaps the mapping work should take place in the wider context of the relevant data models; CWhitt responded that this type of mapping data provides a kind of "sanity check" against the work that has been done in the SIGs vis-a-vis how the system should work. CManly said he'd like to take a list of "missing" data elements to User Management SIG at this point for review and see what and whether said data elements should be included. PWanninger referred to this process as doing gap analysis, based on published JSON.
      • IKuss asked which SIG these lists should go to? CManly – Data Migration comes up with composite lists by app; MWinkler pointed out the lists should go to Product Council to make sure the review is coordinated with appropriate SIG or SIG subgroup.
  • PWanninger asked if anyone in group had put up a local instance. AHighsmith reported Texas A&M close to having an instance up. Patty mentioned there was a plan for a test instance and pointed out that as instances proliferate, they will diverge from demo and more lessons will be learned.
  • Discussion about documentation
    • PWanninger asked about overall data map, i.e. a kind of data dictionary. TOlson pointed out the json schemas have a fair amount of inline documentation and suggested it might be possible to auto-generate a data dictionary from the jsons on github.
    • AHighsmith asked why json schemas didn't document common constraints, such as maximum string length. TOlson said such constraints more typically come out of database column widths, which the JSON isn't dealing with and wondered how we could comment or add descriptions to json schemas so that documentation would be in one place. WSchneider posted the link listed above, https://dev.folio.org/doc/api, saying that this is the data dictionary that is currently posted, which is generated from the source code modules. He wondered if this was the data dictionary that needed to be used. When PWanninger pointed out that the json schemas on that site didn't contain constraints such as max string length, CManly responded that we would probably find, working with JSON, that the records are much less constrained. WSchneider pointed out that format constraints could be added to the interface if desired and generally agreed to. Wayne also said incoming data would be validated against these schemas, which delineate required elements.
    • TOlson pointed out that some of the json schemas could benefit from some prose description; WSchneider responded that a user could make a pull request to put such explanation into the raml repository. IndexData is using 'description' field occasionally to store such comments.
    

Action items