API for querying for MARC records stored in SRS (UXPROD-3086)

[UXPROD-2791] SRS MARC Query API Created: 27/Oct/20  Updated: 17/Oct/23  Resolved: 27/Apr/21

Status: Closed
Project: UX Product
Components: None
Affects versions: None
Fix versions: R1 2021
Parent: API for querying for MARC records stored in SRS

Type: New Feature Priority: P2
Reporter: Jenn Colt Assignee: Jenn Colt
Resolution: Done Votes: 0
Labels: Showstopper-Cornell, r1-2021-at-risk, r1-highlight, split
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Cloners
is cloned by UXPROD-2940 SRS MARC Query API part 2 Closed
Defines
is defined by MODSOURCE-215 MARC search functionality - tech part Closed
is defined by MODSOURCE-227 POC: Streaming Rest API Closed
is defined by MODSOURCE-221 Implement database changes for MARC s... Closed
is defined by MODSOURCE-222 Design API & implement data structure... Closed
is defined by MODSOURCE-223 Design and implement service layer fo... Closed
is defined by MODSOURCE-228 Define a contract for search Closed
is defined by MODSOURCE-255 Return HTTP response according to the... Closed
is defined by MODSOURCE-258 Implement NOT searches Closed
is defined by MODSOURCE-264 Documentation for SRS Query API Closed
is defined by MODSOURMAN-396 SRS MARC queries Closed
Relates
relates to MODSOURMAN-313 SPIKE: Investigate MARC to MARC match... Closed
relates to UXPROD-2184 DRAFT - Filters to select MARC record... Closed
relates to MODSOURCE-254 SPIKE: Review MARC Query work Open
Requires
is required by UXPROD-2942 quickMARC | Search MARC Authority Rec... Closed
is required by UXPROD-2943 quickMARC | Browse Authority Records Closed
Epic Link: API for querying for MARC records stored in SRS
Back End Estimate: XXXL: 30-45 days
Estimation Notes and Assumptions: This includes learning curve (SRS) and implementing the search functionality. Edge module functionality has been split to UXPROD-2916.
Development Team: Concorde
Rank: Chalmers (Impl Aut 2019): R3
Rank: Chicago (MVP Sum 2020): R2
Rank: Cornell (Full Sum 2021): R1
Rank: Duke (Full Sum 2021): R1
Rank: 5Colleges (Full Jul 2021): R2
Rank: GBV (MVP Sum 2020): R4
Rank: Grand Valley (Full Sum 2021): R2
Rank: Lehigh (MVP Summer 2020): R2
Rank: MO State (MVP June 2020): R2
Rank: St. Michael's College (Sum 2021): R2
Rank: TAMU (MVP Jan 2021): R2
Rank: U of AL (MVP Oct 2020): R1
Score: 8
Showstopper for Summer 2021 Implementers?: Yes
Showstopper Comments from Summer 2021 Implementers: [Jenn from Cornell: This is critical for us because without it we cannot query the 10 million bibliographic records that describe our collection and drive discovery. These queries are used for selecting records to export to services like OCLC, identifying records as part of import processes, and identifying records that need cleanup.
Potential workarounds:
* Query LDP - MARC functionality is also not present there, in addition the data is only loaded every 24 hours. More than that, these functions are part of important record operations and we want the maintenance of the API providing that functionality to be part of FOLIO proper. In this same category would be a nightly dump of MARC records that we then examine with local scripting, requiring a lot of local, not shareable development.
* Direct database access - this might be an alternative but it is not available to us and would likely be more difficult than using the API, and is generally contrary to how FOLIO works.
* Don't do the work - Not performing the cleanup, enhancement, and export jobs facilitated by these queries would have a significantly negative impact on discovery and patron experience.]

Showstopper December 11 Meeting Summary: The data is expected to be in the LDP, but it is not there yet. What if it isn't there on time? Or isn't refreshed enough? Also, why should a library have to implement the LDP in order to do this work? Jenn is the PO for this feature, and is from Cornell, so this feature wouldn't need to be completed too far in advance of Cornell's go-live date. If the date ends up not being in the LDP on time, other implementers will need this as well.
Showstopper Capacity Planning Team Recommendation: The Capacity Planning Team recommended to the FOLIO Product Council that the Iris release date be extended to May 3 (from March 1) so that all of the at-risk 'showstopper' features could be completed and released before the July implementations. (Slide deck presented to PC: https://docs.google.com/presentation/d/12s_fs3vqjm4hAGIfZ_HX1jm--v8QOEFber5uvo0VTGw/edit?usp=sharing)
Showstopper FOLIO Product Council Decision: FOLIO Product Council compromised and allowed the Iris release to be delayed by one month, to April 5. Due to the size of the t-shirt estimate for this feature, will NOT be completed as part of the Iris release. The Capacity Planning Team will be meeting to discuss how/when this feature will be delivered to the libraries needing it before implementation.

 Description   

Current situation or problem:
Given FOLIO doesn't yet support searching of MARC records in source record storage (SRS), this feature would provide an API that FOLIO implementers could use to query MARC. The API could be used for a variety of different purposes, such as identifying records for export, updates, and clean up. It could also, eventually, be used to support UI(s) for searching MARC within FOLIO, but that UI is out of scope for this feature. This feature will NOT use elastic search, implementation details to come.

In scope

Allow for search based on the presence or absence of a field, subfield, indicator, or fixed field position
Allow for date range search based on dates present in a subfield
Allow consumers to search new and/or local MARC tags and have them indexed appropriately

Out of scope
A search UI for MARC data
Searching inventory data

Use case(s)
Searching MARC data in order to identify records for data export
Searching MARC data in order to match records to be updated with data import
Searching MARC data as part of ETL clean up processes that libraries run to main data quality
Searching MARC data to identify records in cases where only examination of the MARC allows accurate selection of records



 Comments   
Comment by Ann-Marie Breaux (Inactive) [ 27/Oct/20 ]

Hi Jenn Colt I just had a quick meeting with Taras Spashchenko, and here's a few comments.

  1. There's an ElasticSearch initiative being worked on by VBar and Mikhail Fokanov also, and it was discussed at Tech Council recently. It would be good if this work is coordinated with whatever they are planning, so that it's complementary instead of clashing.
  2. Query and retrieval of SRS records via ElasticSearch should work fine for export, since export is not trying to modify the stored MARC record
  3. Query and retrieval of SRS records via ElasticSearch will have problems for import. My non-developer interpretation of what Taras Spashchenko told me: there is a time lag for data to be available for ElasticSearch. Import may assume the result = no, and then move ahead accordingly, only for a search result reflecting updated MARC data to arrive a few seconds later.
  4. There is a design Jira that Taras is working on, to allow for MARC-MARC matching: MODSOURMAN-313 Closed . Until then Data Import will mainly need to rely on MARC-Instance matching.

cc: Magda Zacharska Kateryna Senchenko Oleksii Kuzminov

Comment by Jenn Colt [ 27/Oct/20 ]

Thanks! This is part of what Vince and others are working on, the link to their plans are at the bottom of the description.
I'm a little confused about the data import comment because the issue you linked to specifically mentions ES (which is why I thought it would be useful for DI).

Maybe the DI question is something to look at from the angle of performance when we know more about how quickly ES will be updated. If DI is using indexes in postgres to search for marc to marc, there will be a delay of some sort there as well I imagine.

Comment by Ann-Marie Breaux (Inactive) [ 08/Dec/20 ]

Discussed with Magda Zacharska, and we moved the prep stories from UXPROD-2742 In Progress to this feature, since Concorde will be working on all of them. The prep stories are all linked under the MODSOURCE-215 Closed umbrella.

cc: Jenn Colt Taras Spashchenko Oleksii Kuzminov

Comment by Jenn Colt [ 10/Dec/20 ]

This is critical for us because without it we cannot query the 10 million bibliographic records that describe our collection and drive discovery. These queries are used for selecting records to export to services like OCLC, identifying records as part of import processes, and identifying records that need cleanup.
Potential workarounds:

  • Query LDP - MARC functionality is also not present there, in addition the data is only loaded every 24 hours. More than that, these functions are part of important record operations and we want the maintenance of the API providing that functionality to be part of FOLIO proper. In this same category would be a nightly dump of MARC records that we then examine with local scripting, requiring a lot of local, not shareable development.
  • Direct database access - this might be an alternative but it is not available to us and would likely be more difficult than using the API, and is generally contrary to how FOLIO works.
  • Don't do the work - Not performing the cleanup, enhancement, and export jobs facilitated by these queries would have a significantly negative impact on discovery and patron experience.
Comment by Charlotte Whitt [ 04/Feb/21 ]

Hi Jenn Colt - I have a question - just something I was wondering about, while going over the features covered by the epic for Entity Management ( UXPROD-787 Open ) - if there might be an overlap/interaction between UXPROD-2791 Closed and the EM app feature: UXPROD-2522 Open
Entity Management App. Query MARC bib records.

CC: Jason Kovari lew235

Comment by Jenn Colt [ 04/Feb/21 ]

Hi Charlotte Whitt yes I think there is overlap (I hope there is!) Because this story is just about the API, the hope is that it will be useful to any app needing to query SRS. So I think what will happen is when EM is closer, they'll need to evaluate the API and figure out where it needs to be extended to cover authorities.

Edit: I see now that one is about querying the bibs themselves. Machine querying along these lines is already included in our use cases for marc querying. But there still won't be a UI. So this will cover querying bibs for the use of authorities but only from a machine standpoint, and only based on strings in the bib, not on database relationships.

Comment by Anya [ 22/Apr/21 ]

Hi Jenn Colt will this be included in Iris? If so could it be added to the release notes?- thanks 

Comment by Khalilah Gambrell [ 23/Apr/21 ]

Anya, this will be included in Iris but there is an issue that may require an Iris hotfix release. I think we will hold on setting this to Done until we have the issue settled.

cc: Jenn Colt

Comment by patty.wanninger [ 11/Aug/21 ]

Khalilah Gambrell Could you update the status of this functionality? Was it released with Iris?

Comment by Khalilah Gambrell [ 11/Aug/21 ]

patty.wanninger, it was released with Iris and Cornell is using it. We did not release edge apis. That will be done with a future release.

cc: Jenn Colt.

Generated at Fri Feb 09 00:26:52 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.