Skip to end of banner
Go to start of banner

2022-11-29 Discovery Integration Subgroup Meeting notes

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Date

EDT 10:00am to 11:00am


Goals

Discussion items

Time

Item

Who

Notes

10:00Start of the meeting

10:05

requests currently not covered by an API for discovery systems (e.g. trending items)


See Brainstorm page for the diverse interface
10:15

start of discussion about data export (OAI-PMH)


Update from Magda via Slack:

Hi all, I would like to follow up on the OAI-PMH discussion from the last week meeting.Harvesting holdings and items data
FOLIO's OAI-PMH does support two metadataPrefixes:

  • marc21 - harvests only SRS MARC records
  • marc21_with holdings  - enriches SRS record with holdings and items fields as described in MODOAIPMH-102


Including inventory data
The Orchid release will include harvesting also records that do not have underlying source record - the work is covered in UXPROD-2404

Stabilization issues:
OAI-PMH indeed had a rough start and required some additional stabilization work.  However,  starting with the Kiwi release we regularly  harvest bugfest data (~8 M records in approximately 11 hours).   The issues we still see are related to multiple full harvest run concurrently (not supported), or bad data in inventory.   We significantly improved monitoring of the harvest that can be done by API calls as described in: https://github.com/folio-org/mod-oai-pmh#harvesting-statistics-api

You can also find more information about FOLIO's implementation of OAI-PMH  in: https://wiki.folio.org/display/FOLIOtips/OAI-PMH+Best+Practices

Unfortunately, I won't be able to attend this week meeting either but I will listen to the recording and respond here.  Also, I should be able to attend the meeting on December 6th if you think that would be helpful.


Discussion:

Villanova has used OAI-PMH to index a couple million records in their test system. Some records get dropped, e.g. due to bad leaders or illegal control characters that cannot occur in XML, so important to watch the export counts and errors. (can usually deal with by fixing the records)

Noted the stabilization work mentioned above.

Is there a real use case for multiple full harvests to different systems simultaneously?

Currently we have Five Colleges, which are on a multi-tenant system, but all have EDS, so EBSCO can apply the full export to all. (Currently harvest about 5.2 million in 11 hours, down from 36 hours over the summer.)

Possible use case where you do not control the schedule of all systems which harvest from you, the system doesn't handle that so well right now.

One solution might be to harvest to an intermediary system like VuFind and then do the multiple harvests from that.

Q: does the FOLIO OAI-PMH support multiple formats, i.e. only MARCXML, or just Dublin Core? (We think just Dublin Core.)

Q: is there a need to support more formats or more verbs? Currently supports "get Record" but does not respect format (see above), which complicates troubleshooting.

One issue is that edge module hits inventory every time it needs to build a list, that is the performance Achilles heel.

10:40How to move forward with these?
  • Divide into sets where subgroups could meet so not everyone needs to look at all issues.
  • Triage: what is most important? what could be done today? what could wait?
  • Could there be a subgroup that does discovery not using OAI-PMH?

Ideas Decision (draft):

  • Identify different subgroups/individuals for the different areas and then triage within each area.
  • Add a column to the Brainstorm pages to sign up to triage individual issues.
  • Use an upcoming meeting time for subgroups to meet.














12:00End of the meeting

  • No labels