Batch Importer (Bib/Acq)
(UXPROD-47)
|
|
| Status: | Closed |
| Project: | mod-inventory-storage |
| Components: | None |
| Affects versions: | None |
| Fix versions: | 19.4.4, 20.0.0 | Parent: | Batch Importer (Bib/Acq) |
| Type: | Story | Priority: | P2 |
| Reporter: | Wayne Schneider | Assignee: | Volodymyr Rohach |
| Resolution: | Done | Votes: | 0 |
| Labels: | Support, data-import, epam-folijet | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Issue links: |
|
||||||||||||||||||||||||
| Sprint: | Folijet Sprint 101, Folijet Sprint 102 | ||||||||||||||||||||||||
| Story Points: | 2 | ||||||||||||||||||||||||
| Development Team: | Folijet | ||||||||||||||||||||||||
| Release: | Q3 2020 Bug Fix | ||||||||||||||||||||||||
| Affected Institution: |
Simmons
|
||||||||||||||||||||||||
| Epic Link: | Batch Importer (Bib/Acq) | ||||||||||||||||||||||||
| RCA Group: | Data related (ex. Can be detected with large dataset only) | ||||||||||||||||||||||||
| Description |
|
We created a job profile for matching an incoming MARC 856 $u against the "Electronic access: URI" property of holdings records, overlaying if matched, or creating a new record if there is no match. The profile worked fine for a small record batch (1-5 records). When we tried it with a large batch (5000 records), database CPU utilization pegged out, and inventory-storage became unresponsive. Log investigation showed many warnings like: WARNING: Doing FT search without index for holdings_record.jsonb->>'electronicAccess', CQL >>> SQL: electronicAccess = "\"uri\":\"https://ebookcentral.proquest.com/lib/simmons-ebooks/detail.action\?docID=147309\"" >>> get_tsvector(f_unaccent(holdings_record.jsonb->>'electronicAccess')) @@ tsquery_phrase(f_unaccent('"uri":"https://ebookcentral.proquest.com/lib/simmons-ebooks/detail.action?docID=147309"')) Inspection of database logs showed long wait times for the query: SELECT jsonb,id FROM sim_mod_inventory_storage.holdings_record WHERE get_tsvector(f_unaccent(holdings_record.jsonb->>?)) @@ tsquery_phrase(f_unaccent(?)) LIMIT ? OFFSET ? |
| Comments |
| Comment by Kelly Drake [ 23/Oct/20 ] |
|
Ann-Marie Breaux - Though you should be aware of this. Might be something good to test in the bugfest. Brooks Travis Ian Walls Anya - This might be something that effects you as well. |
| Comment by Jakub Skoczen [ 26/Oct/20 ] |
|
Ann-Marie Breaux I think this is related to the PR that was closed and not merged due to some other functionality missing, see https://github.com/folio-org/mod-inventory-storage/pull/495 |
| Comment by Ann-Marie Breaux (Inactive) [ 26/Oct/20 ] |
|
Yes, that's correct. This should be working in Goldenrod now (and in Honeysuckle). I'll link the relevant Jiras and close this as a duplicate. Thank you! |
| Comment by Ann-Marie Breaux (Inactive) [ 26/Oct/20 ] |
|
Kateryna Senchenko Please see above comment from Jaub re: a PR that was opened and then not merged (because we found another way to resolve
If yes, please take care of it in the scope of this ticket. If no, please close this ticket as "Duplicate" Thank you! |
| Comment by Ann-Marie Breaux (Inactive) [ 26/Oct/20 ] |
|
Ann-Marie Breaux Note to self: I removed the Q2 2020 Hotfix 3 release. If release is needed, confirm whether Q3 bugfix will be soon enough or if Q2 hotfix is required |
| Comment by Ann-Marie Breaux (Inactive) [ 26/Oct/20 ] |
|
Adding a comment from Kelly Drake in Slack: The Support SIG has recently identified an issue with indexing that will be of interest to everyone. Currently many inventory (and other modules) fields are not indexed and therefore extremely slow to search. In the case of data import this lack of indexing causes the system to crash when attempting to perform the data import match using a non-indexed field. Our first example is matching on the 856 but this will apply to any non-indexed field used for matching and batch loading against large record sets. |
| Comment by Wayne Schneider [ 26/Oct/20 ] |
Apologies, I believe I filled in that field by mistake, trying to indicate that the problem affects the Q2 2020 Hotfix 3 release of FOLIO (which is where it was observed) |
| Comment by Kateryna Senchenko [ 02/Nov/20 ] |
|
Hi Ann-Marie Breaux, I think this issue should be moved to MODINVSTOR and assigned to Core Functional. Any missing indexes that slow down the mod-inventory-storage performance should be added there. Thank you CC: Oleksii Kuzminov |
| Comment by Ann-Marie Breaux (Inactive) [ 02/Nov/20 ] |
|
Hi Marc Johnson Please see the comment above, as well as the description and other comments. What do you think about moving this to MODINVSTOR and Core fxn? |
| Comment by Marc Johnson [ 03/Nov/20 ] |
I agree with Kateryna Senchenko this should be moved to mod-inventory-storage. I'm ambivalent about which team does the work. If you want to assign it to Core Functional, and let Cate Boerema prioritise it, that's totally ok with me, otherwise I have no concerns if Folijet wanted to do this work. |
| Comment by Cate Boerema (Inactive) [ 03/Nov/20 ] |
|
Hi all! Here we are again debating which team should do the work for a Data import problem in Inventory I think we should come up with some guidelines for how to decide when issues should be addressed by Core Functional and when Folijet should address them. There is no reason why Core Functional should have to do the work every time. Some initial thoughts: Core Functional should do the work if:
Otherwise Folijet should do the work for bugs and stories impacting their application. Thoughts? |
| Comment by Ann-Marie Breaux (Inactive) [ 03/Nov/20 ] |
|
Hi Cate Boerema I agree it would be helpful to have some guidelines. I don't think Folijet has done work in terms of creating indexes in Inventory as of yet, which is why we were thinking it would make more sense for Core Fxn to do it. It it would be helpful for Marc Johnson, you, me, and Oleksii Kuzminov to talk it through and come up with some clearer guidelines, I'm happy to schedule that. Just let me know. I'll also say that while there is some negotiation involved at times, I think we usually work it out pretty well. Not sure if that is the feeling on the Core fxn side, but it is on the Folijet side. |
| Comment by Cate Boerema (Inactive) [ 04/Nov/20 ] |
|
Yes, please set up a meeting Ann-Marie Breaux |
| Comment by Ann-Marie Breaux (Inactive) [ 09/Nov/20 ] |
|
Hi Oleksii Kuzminov Can we get this in to the current sprint, and release as a bugfix later this week? |
| Comment by Oleksii Kuzminov [ 09/Nov/20 ] |
|
Hi Ann-Marie Breaux let's do it |
| Comment by Ann-Marie Breaux (Inactive) [ 09/Nov/20 ] |
|
Hi Oleksii Kuzminov OK - I added to our sprint. It needs points. |
| Comment by Ann-Marie Breaux (Inactive) [ 11/Nov/20 ] |
|
Volodymyr Rohach One additional note about this. I just was matching on the URI (856$u) for only 1 record, and then updating the holdings. This is in Bugfest where there are millions of records. The job to update 1 record took 4 minutes. |
| Comment by Ann-Marie Breaux (Inactive) [ 11/Nov/20 ] |
|
Discussed at standup today. Would be good to get into Honeysuckle bugfix, but if it's not possible by the deadline, probably OK to have it done in R1 2021. |
| Comment by Ann-Marie Breaux (Inactive) [ 18/Nov/20 ] |
|
Hi Volodymyr Rohach I can't remember anything from 6 days ago. Too little sleep! Could you remind me - did you do anything to change performance in the scope of this ticket, or not yet? I'm not sure what I should expect when testing it. Thank you! |
| Comment by Volodymyr Rohach [ 18/Nov/20 ] |
|
Hello Ann-Marie Breaux! |
| Comment by Ann-Marie Breaux (Inactive) [ 20/Nov/20 ] |
|
Hi Volodymyr Rohach Tested on Bugfest. I'm not sure it got much faster, but my laptop is also chugging through synchronization, so that may have messed up the response time. I'm going to close it, and we'll work on it more in R1 if necessary |