Batch Importer (Bib/Acq) (UXPROD-47)

[MODINVSTOR-610] Data import with 856 --> electronicAccess match performance issues Created: 22/Oct/20  Updated: 30/Jun/21  Resolved: 13/Nov/20

Status: Closed
Project: mod-inventory-storage
Components: None
Affects versions: None
Fix versions: 19.4.4, 20.0.0
Parent: Batch Importer (Bib/Acq)

Type: Story Priority: P2
Reporter: Wayne Schneider Assignee: Volodymyr Rohach
Resolution: Done Votes: 0
Labels: Support, data-import, epam-folijet
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Blocks
blocks MODINVSTOR-621 Release v19.4.4 Closed
Defines
defines UXPROD-2551 NFR: Data Import (Batch Importer for ... Closed
Relates
relates to MODDICORE-80 Cannot match on eAccess URI field Closed
Sprint: Folijet Sprint 101, Folijet Sprint 102
Story Points: 2
Development Team: Folijet
Release: Q3 2020 Bug Fix
Affected Institution:
Simmons
Epic Link: Batch Importer (Bib/Acq)
RCA Group: Data related (ex. Can be detected with large dataset only)

 Description   

We created a job profile for matching an incoming MARC 856 $u against the "Electronic access: URI" property of holdings records, overlaying if matched, or creating a new record if there is no match. The profile worked fine for a small record batch (1-5 records). When we tried it with a large batch (5000 records), database CPU utilization pegged out, and inventory-storage became unresponsive. Log investigation showed many warnings like:

WARNING: Doing FT search without index for holdings_record.jsonb->>'electronicAccess', CQL >>> SQL: electronicAccess = "\"uri\":\"https://ebookcentral.proquest.com/lib/simmons-ebooks/detail.action\?docID=147309\"" >>> get_tsvector(f_unaccent(holdings_record.jsonb->>'electronicAccess')) @@ tsquery_phrase(f_unaccent('"uri":"https://ebookcentral.proquest.com/lib/simmons-ebooks/detail.action?docID=147309"'))

Inspection of database logs showed long wait times for the query:

SELECT jsonb,id FROM sim_mod_inventory_storage.holdings_record WHERE get_tsvector(f_unaccent(holdings_record.jsonb->>?)) @@ tsquery_phrase(f_unaccent(?)) LIMIT ? OFFSET ?


 Comments   
Comment by Kelly Drake [ 23/Oct/20 ]

Ann-Marie Breaux - Though you should be aware of this. Might be something good to test in the bugfest.

Brooks Travis Ian Walls Anya - This might be something that effects you as well.

Comment by Jakub Skoczen [ 26/Oct/20 ]

Ann-Marie Breaux I think this is related to the PR that was closed and not merged due to some other functionality missing, see https://github.com/folio-org/mod-inventory-storage/pull/495

Comment by Ann-Marie Breaux (Inactive) [ 26/Oct/20 ]

Hi Jakub Skoczen

Yes, that's correct. This should be working in Goldenrod now (and in Honeysuckle). I'll link the relevant Jiras and close this as a duplicate. Thank you!

Comment by Ann-Marie Breaux (Inactive) [ 26/Oct/20 ]

Kateryna Senchenko Please see above comment from Jaub re: a PR that was opened and then not merged (because we found another way to resolve MODDICORE-80 Closed ). Do you think any additional work is needed to take care of performance issues with the eAccess/856 $u matching?

If yes, please take care of it in the scope of this ticket.

If no, please close this ticket as "Duplicate"

Thank you!

Comment by Ann-Marie Breaux (Inactive) [ 26/Oct/20 ]

Ann-Marie Breaux Note to self: I removed the Q2 2020 Hotfix 3 release. If release is needed, confirm whether Q3 bugfix will be soon enough or if Q2 hotfix is required

Comment by Ann-Marie Breaux (Inactive) [ 26/Oct/20 ]

Adding a comment from Kelly Drake in Slack: The Support SIG has recently identified an issue with indexing that will be of interest to everyone. Currently many inventory (and other modules) fields are not indexed and therefore extremely slow to search. In the case of data import this lack of indexing causes the system to crash when attempting to perform the data import match using a non-indexed field. Our first example is matching on the 856 but this will apply to any non-indexed field used for matching and batch loading against large record sets.

Comment by Wayne Schneider [ 26/Oct/20 ]

Ann-Marie Breaux

Ann-Marie Breaux Note to self: I removed the Q2 2020 Hotfix 3 release. If release is needed, confirm whether Q3 bugfix will be soon enough or if Q2 hotfix is required

Apologies, I believe I filled in that field by mistake, trying to indicate that the problem affects the Q2 2020 Hotfix 3 release of FOLIO (which is where it was observed)

Comment by Kateryna Senchenko [ 02/Nov/20 ]

Hi Ann-Marie Breaux,

I think this issue should be moved to MODINVSTOR and assigned to Core Functional. Any missing indexes that slow down the mod-inventory-storage performance should be added there. Thank you

CC: Oleksii Kuzminov

Comment by Ann-Marie Breaux (Inactive) [ 02/Nov/20 ]

Hi Marc Johnson Please see the comment above, as well as the description and other comments. What do you think about moving this to MODINVSTOR and Core fxn?

Comment by Marc Johnson [ 03/Nov/20 ]

Ann-Marie Breaux

I think this issue should be moved to MODINVSTOR and assigned to Core Functional. Any missing indexes that slow down the mod-inventory-storage performance should be added there. Thank you

I agree with Kateryna Senchenko this should be moved to mod-inventory-storage.

I'm ambivalent about which team does the work. If you want to assign it to Core Functional, and let Cate Boerema prioritise it, that's totally ok with me, otherwise I have no concerns if Folijet wanted to do this work.

Comment by Cate Boerema (Inactive) [ 03/Nov/20 ]

Hi all! Here we are again debating which team should do the work for a Data import problem in Inventory This seems like a weekly occurrence these days!

I think we should come up with some guidelines for how to decide when issues should be addressed by Core Functional and when Folijet should address them. There is no reason why Core Functional should have to do the work every time. Some initial thoughts:

Core Functional should do the work if:

  1. Their development caused the bug (missing indexes don't qualify as a bug to me)
  2. It would be much more efficient for them to do the work

Otherwise Folijet should do the work for bugs and stories impacting their application.

Thoughts?

Comment by Ann-Marie Breaux (Inactive) [ 03/Nov/20 ]

Hi Cate Boerema I agree it would be helpful to have some guidelines. I don't think Folijet has done work in terms of creating indexes in Inventory as of yet, which is why we were thinking it would make more sense for Core Fxn to do it. It it would be helpful for Marc Johnson, you, me, and Oleksii Kuzminov to talk it through and come up with some clearer guidelines, I'm happy to schedule that. Just let me know. I'll also say that while there is some negotiation involved at times, I think we usually work it out pretty well. Not sure if that is the feeling on the Core fxn side, but it is on the Folijet side.

Comment by Cate Boerema (Inactive) [ 04/Nov/20 ]

Yes, please set up a meeting Ann-Marie Breaux

Comment by Ann-Marie Breaux (Inactive) [ 09/Nov/20 ]

Hi Oleksii Kuzminov Can we get this in to the current sprint, and release as a bugfix later this week?

Comment by Oleksii Kuzminov [ 09/Nov/20 ]

Hi Ann-Marie Breaux let's do it

Comment by Ann-Marie Breaux (Inactive) [ 09/Nov/20 ]

Hi Oleksii Kuzminov OK - I added to our sprint. It needs points.

Comment by Ann-Marie Breaux (Inactive) [ 11/Nov/20 ]

Volodymyr Rohach One additional note about this. I just was matching on the URI (856$u) for only 1 record, and then updating the holdings. This is in Bugfest where there are millions of records. The job to update 1 record took 4 minutes.

Comment by Ann-Marie Breaux (Inactive) [ 11/Nov/20 ]

Discussed at standup today. Would be good to get into Honeysuckle bugfix, but if it's not possible by the deadline, probably OK to have it done in R1 2021.

Comment by Ann-Marie Breaux (Inactive) [ 18/Nov/20 ]

Hi Volodymyr Rohach I can't remember anything from 6 days ago. Too little sleep! Could you remind me - did you do anything to change performance in the scope of this ticket, or not yet? I'm not sure what I should expect when testing it. Thank you!

Comment by Volodymyr Rohach [ 18/Nov/20 ]

Hello Ann-Marie Breaux!
Yes, I added a new index for this field. It should increase performance. If not, we`ll investigate this one more time (maybe add a new type of index).
So, for testing it, I think you can just repeat your cases when it was too long:
"Volodymyr Rohach One additional note about this. I just was matching on the URI (856$u) for only 1 record, and then updating the holdings. This is in Bugfest where there are millions of records. The job to update 1 record took 4 minutes."

Comment by Ann-Marie Breaux (Inactive) [ 20/Nov/20 ]

Hi Volodymyr Rohach Tested on Bugfest. I'm not sure it got much faster, but my laptop is also chugging through synchronization, so that may have messed up the response time. I'm going to close it, and we'll work on it more in R1 if necessary

Generated at Thu Feb 08 22:21:14 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.