Import fails with "'idx_records_matched_id_gen', duplicate key value violates unique constraint" SRS logs JUNIPER HF

Description

Documents Case 3 in the comments on https://folio-org.atlassian.net/browse/MODINVSTOR-815#icft=MODINVSTOR-815

This issue is observed in a Honeysuckle HF3 environment and in Iris, but may have been fixed by Hotfix 1

Additional info from one of 's comments below:

  1. Data Import Run A: Create 10,300 SRS MARC Bib and Instances using new Data Import default job profile for Iris (

  1. )

  2. Retrieve Instance UUIDs via SRS MARC Query API:
    { "fieldsSearchExpression": "948.d ^= 'cu-batch'" }

  3. Export the full MARC for the 10,300 records using Data Export default job profile

  4. Create & associate DI profiles

    • job profile –

  1.  

    • match profile (001 -> instance hrid) –

  1.  

    • action profile (UPDATE Instance on match) –

  1.  

    • mapping (overlay) –

  1. Process the exported MARC (cleanup OCLC identifiers in 035$a)
    Note: When we originally ran this and encountered the error, we  DID NOT strip the 999 ff

  2. Data Import Run B: Update the SRS MARC Bib records using job profile from #4 (1 file: 10,300 records)

I decided to try running through the test on the Iris reference environment and was unsuccessful in making it through all steps. Here are the results:

  1. DI Run A: Initial create of 10,300 SRS MARC Bib and Instances (hrid: 17): 12 m

  2. Retrieve Instance UUIDs via SRS MARC Query API: 629 ms
    { "fieldsSearchExpression": "948.d ^= 'cu-batch'" }

  3. Export the full MARC for the 10,300 records (hrid: 8): 3 m

  4. DI profiles ported via API and manually linked/related

  5. Process the exported MARC (cleanup OCLC identifiers in 035$a AND strip 999 ff) with external script: 4 s

  6. DI Run B: Update the 10,300 SRS MARC Bib records: Stuck at 37% after ~10 m

A follow up to SRS MARC Query API reveals 6,471 of the records remain in their original state, so we can conclude that 3,829 were updated (37%)
{ "fieldsSearchExpression": "(035.a ^= '(OCoLC)oc' or 035.a ^= '(OCoLC)0') and 948.d ^= 'cu-batch'" }

======================

When attempting to import a file - the import fails and the following message is observed in mod-source-record-storage logs
"idx_records_matched_id_gen", duplicate key value violates unique constraint

New Import's which fail are attempts to update records from a previous Data Import for a batch of appox 14K that also had issues (specifically the earlier batch failed with – Completed with errors status.)

Need analysis to understand state of related db table entries for Data Import attempts which are failing with this constraint error

CSP Request Details

Juniper hotfix approved 25 Oct 2021 in the Slack release_bug_triage channel

CSP Rejection Details

None

Potential Workaround

None

Attachments

16

Checklist

hide

TestRail: Results

Activity

Show:

Ann-Marie BreauxNovember 10, 2021 at 2:39 PM

Tested on Juniper BF, and no longer reproducing

Aliaksandr FedasiukNovember 9, 2021 at 2:48 PM
Edited

, the barcode search is used to check whether a barcode number is unique or not.
Now I see that import process on Juniper BF is stable but is much slower that import on Kiwi BF (I attached images:

).

Now I will investigate it.

I think we need include (https://folio-org.atlassian.net/browse/MODINVSTOR-792#icft=MODINVSTOR-792 and https://folio-org.atlassian.net/browse/MODINV-508#icft=MODINV-508) in Juniper if we have the capability to do it because without barcode index import will be really slow.

Ann-Marie BreauxNovember 9, 2021 at 2:02 PM
Edited

Thanks, Why is it trying to search for barcodes? There's nothing in the import that involves barcodes.

please review Martin's comments above. Is there any other hotfix we need to apply to Juniper? Both of the fixes that he mentions (https://folio-org.atlassian.net/browse/MODINVSTOR-792#icft=MODINVSTOR-792 and https://folio-org.atlassian.net/browse/MODINV-508#icft=MODINV-508) are in Kiwi, but not Juniper.

I'll try to import the 50K file again. Job started at 2:05 pm Juniper Bugfest time.

Martin TranNovember 8, 2021 at 11:49 PM

I applied the unique barcode index, things seem faster now. Please give it a try, .

Martin TranNovember 8, 2021 at 11:35 PM

There were two main issues during 's 50K import:

  1. Searching by empty barcode.

SELECT jsonb,id FROM fs09000000_mod_inventory_storage.item WHERE lower(f_unaccent(item.jsonb->>'barcode')) LIKE lower(f_unaccent('')) LIMIT 10 OFFSET 0
SELECT fs09000000_mod_inventory_storage.count_estimate('SELECT jsonb,id FROM fs09000000_mod_inventory_storage.item WHERE lower(f_unaccent(item.jsonb->>''barcode'')) LIKE lower(f_unaccent(''''))')

These queries as noted in https://folio-org.atlassian.net/browse/MODINVSTOR-792 (and is fixed by  https://folio-org.atlassian.net/browse/MODINV-508) caused intense slowness in the DB. To remedy this we need to apply the unique barcode index, or revert mod-inventory back to a few prior versions. 

2. Seeing the following errors in mod-inventory, which could be symptoms of issue 1, but could be independent.

10:03:27 [] [] [] [] WARN AppInfoParser Error registering AppInfo mbean 10:04:23 [] [] [] [] WARN ? Thread Thread[vert.x-worker-thread-2,5,main] has been blocked for 60072 ms, time limit is 60000 ms 13:06:47 [] [] [] [] WARN ? Thread Thread[vert.x-worker-thread-0,5,main] has been blocked for 85063 ms, time limit is 60000 ms

Also, the following errors need to be looked at and could impact performance:

10:03:25 [] [] [] [] WARN AbstractConfig The configuration 'topic.require.compact' was supplied but isn't a known config. 10:03:27 [] [] [] [] WARN KafkaCache The replication factor of the topic events_cache is less than the desired one of 3. If this is a production environment, it's crucial to add more brokers and increase the replication factor of the topic.

 

Done

Details

Assignee

Reporter

Priority

Story Points

Development Team

Folijet Support

Fix versions

Release

R2 2021 Hot Fix #4

RCA Group

Implementation coding issue

CSP Approved

Yes

Affected Institution

!!!ALL!!!

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs

Created April 16, 2021 at 3:47 PM
Updated January 26, 2022 at 3:15 PM
Resolved October 28, 2021 at 3:37 PM
TestRail: Cases
TestRail: Runs