Batch Importer (Bib/Acq) (UXPROD-47)

[MODDICORE-231] Data Import matches on identifier type and identifier value separately, resulting in incorrect matches (Kiwi BF) Created: 02/Dec/21  Updated: 09/Dec/22  Resolved: 21/Dec/21

Status: Closed
Project: data-import-processing-core
Components: None
Affects versions: None
Fix versions: 3.2.6
Parent: Batch Importer (Bib/Acq)

Type: Bug Priority: P2
Reporter: Lisa Sjögren Assignee: Khamidulla Abdulkhakimov
Resolution: Done Votes: 0
Labels: data-import, epam-folijet, folijet-support, has-testrail, sprint-129, support
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: File ID Match Test File - Create.mrc     File ID Match Test File - Update1.mrc     File ID Match Test File - Update2.mrc     File ID Match Test File - Update3.mrc     File ID Match Test File - Update4.mrc     Microsoft Word identifier_mismatch_summary.xlsx     PNG File image-2021-12-06-13-20-10-603.png     PNG File image-2021-12-20-20-05-10-826.png     PNG File image-2021-12-20-20-05-20-385.png     PNG File screenshot-1.png     PNG File screenshot-2.png     PNG File screenshot-3.png     PNG File screenshot-4.png     PNG File screenshot-5.png     PNG File screenshot-6.png     PNG File screenshot-7.png     PNG File screenshot-8.png     PNG File screenshot-9.png    
Issue links:
Blocks
blocks MODDICORE-232 Release v3.2.6 (R3 Kiwi Bugfix) Closed
Cloners
is cloned by MODDICORE-233 SPIKE: Test the following MARC/Instan... Closed
Defines
defines UXPROD-3463 NFR: Data Import R1 2022 Lotus Suppor... Closed
Relates
relates to MODDICORE-158 SPIKE: Data Import overlay by matchin... Closed
relates to MODINV-604 Update data-import-processing-core de... Closed
relates to FAT-1474 Test import with match on identifier ... Closed
Sprint:
Story Points: 5
Development Team: Folijet Support
Release: R3 2021 Bug Fix
Affected Institution:
Chalmers, TAMU
Epic Link: Batch Importer (Bib/Acq)

 Description   

Possible Juniper HF; Kiwi BF

Original details below the double line; updated bug details above the line; see MODDICORE-233 Closed for additional test cases

Requirements:
Requirement 1

  • When a MARC Bib-to-Instance match is defined in a match profile
  • And the Instance matchpoint is any of the Identifier options
  • Then ensure that any instance identified as a match meets both the Identifier type and Data requirements
  • Example:
    • Match profile of MARC Bib 910$a exactly matches Identifier: ASIN
    • 910$a value 12345 and Instance value 12345 with Identifier type ASIN: MATCH
    • 910$a value 12345 and Instance value 12345 with Identifier type: OCLC: NO MATCH

Requirement 2

  • When a MARC Bib-to-Instance match is defined in a match profile
  • And the Instance matchpoint is any of the Identifier options
  • Then ensure that standard match logic is followed for single matches, multiple matches, and no matches
    • Single match: Take whatever action(s) are specified in the job profile for a match. If there is no action, STOP
    • No match: Take whatever action(s) are specified in the job profile for no-match. If there is no action, STOP
    • Multiple matches: STOP; discard the record and take no action on the instance

Requirement 3

  • When a MARC Bib-to-Instance match is defined in a match profile
  • And the Instance matchpoint is any of the Identifier options
  • And any of these options are marked in the match profile: 1) Use a qualifier for incoming value, 2) Only compare part of the incoming value, 3) Exact/Contains/Begins/End, 4) Use a qualifier for existing value 5) Only compare part of the existing value
  • Then ensure that the appropriate match logic for each marked option is used when determining if there is a match or not

Basic test See additional tests on MODDICORE-233 Closed

  1. Have Kiwi bugfest and snapshot-load environments open (Kiwi bugfest for current behavior, snapshot-load for corrected behavior)
  2. Go to Inventory and search for the following Identifiers. Make sure that identifier does not already exist in either environment (so that you will not encounter multiple matches)
    • ORD32671387-4
    • (AMB)84714376518561876438
    • (OCLC)84714376518561876438
    • 84714376518561876438
  3. Import the same file ID Match Test File - Create.mrc into both environments, using the Default - Create instance and SRS MARC Bib job profile
  4. Once imported view the Identifiers on the Instances created from the file
    • Title: Competing with Idiots
      • Identifier type: UPC with Value ORD32671387-4
      • Identifier type: OCLC with Value (OCoLC)84714376518561876438
    • Title: Letters from a Stoic
      • Identifier type: Invalid UPC with Value ORD32671387-4
      • Identifier type: System Control Number with Value (AMB)84714376518561876438
  5. Next will be 4 matching tests, checking that the same value with different identifier types is matched properly, and that a numbers-only match with the same numeric values but different alphas and different identifier types are matched properly
  6. In Settings, create the following match profiles:
  7. Match profile 1
    • Name: ID Match Test - Update1 (Valid UPC)
    • Incoming records: MARC Bib
    • Existing records: Instance
    • Incoming record:
      • Field: 024
      • In.1: 1
      • In.2: *
      • Subfield: a
      • No qualifier or Compare part of value
    • Exactly matches
    • Existing Instance record
      • Field: Identifier: UPC
      • No qualifier or Compare part of value
  8. Match profile 2
    • Name: ID Match Test - Update2 (Invalid UPC)
    • Incoming records: MARC Bib
    • Existing records: Instance
    • Incoming record:
      • Field: 024
      • In.1: 1
      • In.2: *
      • Subfield: z
      • No qualifier or Compare part of value
    • Exactly matches
    • Existing Instance record
      • Field: Identifier: Invalid UPC
      • No qualifier or Compare part of value
  9. Match profile 3
    • Name: ID Match Test - Update3 (OCLC)
    • Incoming records: MARC Bib
    • Existing records: Instance
    • Incoming record:
      • Field: 035
      • In.1: *
      • In.2: *
      • Subfield: a
      • Qualifier: Begins with: (OCoLC)
      • Compare part of value: Numerics only
    • Exactly matches
    • Existing Instance record
      • Field: Identifier: OCLC
      • Qualifier: none
      • Compare part of value: Numerics only
  10. Match profile 4
    • Name: ID Match Test - Update4 (System control number)
    • Incoming records: MARC Bib
    • Existing records: Instance
    • Incoming record:
      • Field: 035
      • In.1: *
      • In.2: *
      • Subfield: a
      • Qualifier: Begins with: (AMB)
      • Compare part of value: none
    • Exactly matches
    • Existing Instance record
      • Field: Identifier: System control number
      • Qualifier: none
      • Compare part of value: none
  11. Create 4 Field mapping profiles
  12. Field mapping profile 1
    • Name: ID Match Test - Update1 (Valid UPC)
    • Incoming record type: MARC Bibliographic
    • FOLIO record type: Instance
    • Suppress from discovery: Mark for all affected records
    • Cataloged date: Click the Accepted values dropdown, and select Choose date, then select 1 December 2021 from the calendar (which will fill in "2021-12-01" for the cataloged date
    • Instance status: Click the Accepted values dropdown, and select Batch loaded
  13. Field mapping profile 2
    • Name: ID Match Test - Update2 (Invalid UPC)
    • Incoming record type: MARC Bibliographic
    • FOLIO record type: Instance
    • Staff suppress: Mark for all affected records
    • Cataloged date: Click the Accepted values dropdown, and select Choose date, then select 2 December 2021 from the calendar (which will fill in "2021-12-02" for the cataloged date
    • Instance status" Click the Accepted values dropdown, and select Cataloged
  14. Field mapping profile 3
    • Name: ID Match Test - Update3 (OCLC)
    • Incoming record type: MARC Bibliographic
    • FOLIO record type: Instance
    • Suppress from discovery: Unmark for all affected records
    • Cataloged date: Click the Accepted values dropdown, and select Choose date, then select 3 December 2021 from the calendar (which will fill in "2021-12-03" for the cataloged date
    • Instance status" Click the Accepted values dropdown, and select Not yet assigned
  15. Field mapping profile 4
    • Name: ID Match Test - Update4 (System control number)
    • Incoming record type: MARC Bibliographic
    • FOLIO record type: Instance
    • Staff suppress: Unmark for all affected records
    • Cataloged date: Click the Accepted values dropdown, and select Choose date, then select 4 December 2021 from the calendar (which will fill in "2021-12-04" for the cataloged date
    • Instance status" Click the Accepted values dropdown, and select Other
  16. Create 4 Action profiles
  17. Action profile 1
    • Name: ID Match Test - Update1 (Valid UPC)
    • Action: Update
    • FOLIO record type: Instance
    • Link the field mapping profile of the same name
  18. Action profile 2
    • Name: ID Match Test - Update2 (Invalid UPC)
    • Action: Update
    • FOLIO record type: Instance
    • Link the field mapping profile of the same name
  19. Action profile 3
    • Name: ID Match Test - Update3 (OCLC)
    • Action: Update
    • FOLIO record type: Instance
    • Link the field mapping profile of the same name
  20. Action profile 4
    • Name: ID Match Test - Update4 (System control number)
    • Action: Update
    • FOLIO record type: Instance
    • Link the field mapping profile of the same name
  21. Create 4 Job profiles
  22. Job profile 1
    • Name: ID Match Test - Update1 (Valid UPC)
    • Accepted data type: MARC
    • Click + and add the Match profile of the same name
    • For matches: Click plus and add the Action profile of the same name
    • For non-matches: none
  23. Job profile 2
    • Name: ID Match Test - Update2 (Invalid UPC)
    • Accepted data type: MARC
    • Click + and add the Match profile of the same name
    • For matches: Click plus and add the Action profile of the same name
    • For non-matches: none
  24. Job profile 3
    • Name: ID Match Test - Update3 (OCLC)
    • Click + and add the Match profile of the same name
    • Accepted data type: MARC
    • For matches: Click plus and add the Action profile of the same name
    • For non-matches: none
  25. Job profile 4
    • Name: ID Match Test - Update4 (System control number)
    • Accepted data type: MARC
    • Click + and add the Match profile of the same name
    • For matches: Click plus and add the Action profile of the same name
    • For non-matches: none
  26. In sequence, import each of the following files into Kiwi-BF and folio-snapshot-load, using the specified job profile
  27. Update 1
    • File name: ID Match Test File - Update1.mrc
    • Job profile name: ID Match Test - Update1 (Valid UPC)
  28. Update 2
    • File name: ID Match Test File - Update2.mrc
    • Job profile name: ID Match Test - Update2 (Invalid UPC)
  29. Update 3
    • File name: ID Match Test File - Update3.mrc
    • Job profile name: ID Match Test - Update3 (OCLC)
  30. Update 4
    • File name: ID Match Test File - Update4.mrc
    • Job profile name: ID Match Test - Update4 (System control number)
  31. After each import review the 2 Instances in Inventory in Kiwi BF and folio-snapshot-load
  32. What should happen if the match is success for each job profiles
  33. Update 1
    • Kiwi-BF (before the fix)
      • Title: Competing with Idiots
        • Match should have failed (since 2 Instances have the same identifier, but different identifier types)
        • No changes to the instance
      • Title: Letters from a Stoic
        • Match should have failed (since 2 Instances have the same identifier, but different identifier types)
        • No changes to the instance
    • folio-snapshot (after the fix)
      • Title: Competing with Idiots
        • Match should have succeeded (since this instance has the same value, and ID type of UPC)
        • Check for the following changes in the Instance:
          • Marked as Suppressed from discovery
          • Cataloged date: 2021-12-01
          • Instance status: Batch Loaded
          • In the Notes accordion, there should be a new General note that begins: IDENTIFIER UPDATE 1
      • Title: Letters from a Stoic
        • Match should have failed (since this instance has the same value, but ID type of Invalid UPC (instead of UPC)
        • No changes to the Instance
  34. Update 2
    • Kiwi-BF (before the fix)
      • Title: Competing with Idiots
        • Match should have failed (since 2 Instances have the same identifier, but different identifier types)
        • No changes to the instance
      • Title: Letters from a Stoic
        • Match should have failed (since 2 Instances have the same identifier, but different identifier types)
        • No changes to the instance
    • folio-snapshot (after the fix)
      • Title: Competing with Idiots
        • Match should have failed (since this instance has the same value, but ID type of UPC (instead of Invalid UPC)
        • No changes to the Instance
      • Title: Letters from a Stoic
        • Match should have succeeded (since this instance has the same value, and ID type of Invalid UPC)
        • Check for the following changes in the Instance:
          • Marked as Staff suppress
          • Cataloged date: 2021-12-02
          • Instance status: Cataloged
          • In the Notes accordion, there should be a new General note that begins: IDENTIFIER UPDATE 2
  35. Update 3
    • Kiwi-BF (before the fix)
      • Title: Competing with Idiots
        • Match may have failed due to multiple matches (since this instance has the same value, but ID type of OCLC (instead of System control number); match should be ignoring any prefix for the matching, but only attempt to match the 035 of the incoming record (according to the match profile)
        • If the match succeeded
          • Cataloged date: 2021-12-03
          • Instance status: Not yet assigned
          • In the Notes accordion, there should be a new General note that begins: IDENTIFIER UPDATE 3
      • Title: Letters from a Stoic
        • Match may have succeeded (since the match is on numerics only, and the existing record has an 035 with matching numbers, though not matching alphas or Match may have failed due to multiple matches
        • If the match succeeded
          • Cataloged date: 2021-12-03
          • Instance status: Not yet assigned
          • In the Notes accordion, there should be a new General note that begins: IDENTIFIER UPDATE 3
    • folio-snapshot (after the fix)
      • Title: Competing with Idiots
        • Match should have succeeded (since the match is based on prefix of (OCoLC) matching numerics only, and Identifier type of OCLC
        • Check the Instance for the following updates:
          • Cataloged date: 2021-12-03
          • Instance status: Not yet assigned
          • In the Notes accordion, there should be a new General note that begins: IDENTIFIER UPDATE 3
      • Title: Letters from a Stoic
        • Match should not have succeeded, since the match requires Identifier type OCLC, and there's not a number with that type in the Instance
        • No changes to the Instance
  36. Update 4
    • Kiwi-BF (before the fix)
      • Title: Competing with Idiots
        • Match probably failed
        • No changes to the Instance
      • Title: Letters from a Stoic
        • Match probably failed
        • No changes to the Instance
    • folio-snapshot (after the fix)
      • Title: Competing with Idiots
        • Match should fail (since the only System control number in the Instance does not match the alphanumeric value of the incoming record)
        • No changes to the Instance
      • Title: Letters from a Stoic
        • Match should succeed (since the Instance has a System control number that matches an 035 on the incoming record)
        • Check the Instance for the following updates:
          • Cataloged date: 2021-12-04
          • Instance status: Other
          • In the Notes accordion, there should be a new General note that begins: IDENTIFIER UPDATE 4

===============================================================

Overview

When importing records using Data Import and an import profile that matches incoming records on a specific instance identifier, FOLIO returns both

  • records which have that an identifier of the given type with the given value
    and
  • records that have one identifier identifier of the given type, and another identifier with the given value

This results in too many matches, failed overlays and incorrect overlays.

Steps to Reproduce

  1. Log into Bugfest Juniper
  2. Create instance A with identifiers

"identifiers": [
                {
                    "identifierTypeId": "be0f28f8-5814-4b68-ace5-f1cae80a8ae0", (Libirs ID)
                    "value": "123abc"
                }
            ],

https://bugfest-juniper.folio.ebsco.com/inventory/view/c8a07499-8e31-46ca-819f-7f84d6c51403

  1. Create instance B with identifiers

"identifiers": [
                {
                    "identifierTypeId": "be0f28f8-5814-4b68-ace5-f1cae80a8ae0", (Libirs ID)
                    "value": "456def"
                },
                {
                    "identifierTypeId": "18a2affc-4155-46c8-ac26-db4ae64eef2e", (Sierra Bib ID)
                    "value": "123abc"
                }
            ]

https://bugfest-juniper.folio.ebsco.com/inventory/view/4b428f92-7c83-47cb-94c0-9c476d30c4a5

  1. Go to data import, and import a MARC record with 001 “123abc”, using an import profile that matches incoming 001 on instance identifier of type “be0f28f8-5814-4b68-ace5-f1cae80a8ae0”

https://bugfest-juniper.folio.ebsco.com/settings/data-import/job-profiles/view/ca77ea8c-c836-4b4c-8c38-679926702fc5?query=lisa&sort=name

Expected Result

instance A, which has an identifier of type be0f28f8-5814-4b68-ace5-f1cae80a8ae0 with the value “123abc”, is overlaid.

Actual Result

The import is “completed with errors”, and no instance is overlaid.

The error log (thank you Michelle Suranofsky!) shows an error like this (same error, different example)

ERROR AbstractLoader   	Found multiple records matching specified conditions. CQL query: [identifiers=""\""identifierTypeId\"":\""28c170c6-3194-4cff-bfb2-ee9525205cf7\"""" AND (identifiers=""\""value\"":\""18124354\"""")]."
2021-11-19 12:50:06.375,Found records: [ {

Additional Information

Hypothesis and consequences

Using a match profile that matches incoming 001 on an instance identifier of type ISBN, when I import a record with 001 "123" FOLIO will consider records a match if they fill the following criteria:

  1. the record has an identifier which is of type ISBN
  2. the record has an identifier which has value "123"

What's noteworthy is that these two criteria do not have to be fulfilled by the _same _identifier object in the instance. The consequence of this is that FOLIO sometimes finds “false duplicates” (eg one record with ISBN 123, and another with Invalid ISBN 123) that cause the overlay to fail, and sometimes overlays the wrong record (eg a record with ISBN 789 and OCLC Number 123).

The query syntax behind the match

This behaviour can be more easily observed by testing the query syntax given in the error message above.

Given a FOLIO record with these identifiers exists in BugFest Juniper (and no other record with identifier "888555"):

"identifiers": [ { "identifierTypeId": "fcca2643-406a-482a-b760-07a7f8aec640", "value": "888555" }, { "identifierTypeId": "8261054f-be78-422d-bd51-4ed9f33c3422", "value": "785633" } ],

The following query

{{baseUrl}}/inventory/instances?query=identifiers=""\""identifierTypeId\"":\""8261054f-be78-422d-bd51-4ed9f33c3422\"""" AND (identifiers=""\""value\"":\""888555\"""")

should not return any matches. However, it returns the above instance.

Compare this with another syntax that returns the expected result

In contrast, the syntax for searching specific identifiers described in https://folio-org.atlassian.net/wiki/pages/viewpage.action?pageId=5669744 returns the expected results.

/inventory/instances?query=(identifiers= /@value/@identifierTypeId=8261054f-be78-422d-bd51-4ed9f33c3422 (785633))

returns one record

{{baseUrl}}/inventory/instances?query=(identifiers= /@value/@identifierTypeId=8261054f-be78-422d-bd51-4ed9f33c3422 (888555))

returns zero records

Interested parties

Chalmers, any library using identifiers as a match point in Data Import. Priority high.



 Comments   
Comment by Lisa Sjögren [ 02/Dec/21 ]

ping Ann-Marie Breaux

Comment by Ann-Marie Breaux (Inactive) [ 03/Dec/21 ]

Thanks, Lisa Sjögren I've moved this to draft until I can add repro steps with UI details. More soon

Comment by Lisa Sjögren [ 03/Dec/21 ]

Ann-Marie Breaux Let me know if there are any details I can add to help.

Comment by Lisa Sjögren [ 03/Dec/21 ]

I added some screenshots and UI links to the existing steps, if that is helpful.

Comment by Anne L. Highsmith [ 03/Dec/21 ]

Lisa Sjögren Ann-Marie Breaux Can this situation be mitigated by removing numbers from instance identifiers or will it still be a potential problem because the fields remain in the bib record? For example, I've run reports against our test database and find that many potential duplicates among instance records occur because of integers in the 024 field. I could mitigate that by removing 024 from the mapping rules, but that will only help with matching on instance records. 

Also, we are currently mapping our old voyager bib ids, which are integers, into identifier fields. I could map them into a staff note instead.

Any advice you can offer will be greatly appreciated, as I have only 2 working weeks before winter break to fix this – we're scheduled for go-live on 1/10/2022.

Comment by Ann-Marie Breaux (Inactive) [ 03/Dec/21 ]

Hi Anne L. Highsmith I'm not exactly following. Would it be possible to provide some examples of the 024s that you're talking about, and the match profile you're trying to use?

Lisa Sjögren For the Libirs ID and the Sierra Bib ID, were the default MARC Bib-to-Instance mapping rules customized to cause data in specific MARC fields to be mapped to those identifier types? For an incoming MARC record, the incoming 001 data (if it's not an HRID) maps into an 035 field. In the default field MARC-Instance mapping profile, those 035 fields are assigned Identifier type of Other system identifier, except if they have a prefix of (OCLC), ocm, or ocn, they are assigned Identifier type of OCLC.

Jenn Colt I think I remember our talking about this at one point. When a MARC Bib is new to SRS, do you remember if the match action fires before or after the 001/003/035 manipulation happens?

If the 001/003/035 manipulation happens before and the default MARC-Instance mappings are in use, the incoming 001 would have been assigned Identifier type System control number, and thus it would not match to a number with Identifier type = Libirs ID.

We may need a quick meeting early next week to talk through this together. I think I may be misunderstanding something in the description. We'll get it sorted out.

Comment by Anne L. Highsmith [ 03/Dec/21 ]

Ann-Marie Breaux Lisa Sjögren Jenn Colt I would love to meet on this if you have time; I'd make time on my schedule for it. Or maybe there's some regular meeting y'all normally attend that has time on its agenda?

Comment by Jenn Colt [ 03/Dec/21 ]

Ann-Marie Breaux my observation is that that the match happens after the create of the uploaded record in SRS. I upload a record that has

=001 ebs1192136e
=003 EBZ

and it will match with and then update an existing record that has

=035  (EBZ)ebs1192136e

 I use the qualifier option in a lot of my profiles.

I also realize now we are talking about overlaying non-MARC instance Ann-Marie Breaux I think looking at Lisa's match profile might help, it is different from what we typically do. I only ever match 001 to hrid, which might explain why we haven't seen this;.

 

Comment by Anne L. Highsmith [ 03/Dec/21 ]

Jenn Colt based on your most recent update ... do you think that when trying to match an incoming OCLC number to an existing OCLC number, that this problem will be mitigated if you use the qualifier option on the match, and then specify the '(OCoLC)' as qualifier? We have only 50 identifiers that are of type OCLC that don't have the (OCoLC) qualifier.

Comment by Jenn Colt [ 03/Dec/21 ]

I do think that should help. I started using the qualifiers because I was worried this kind of thing was possible.

Comment by Jenn Colt [ 03/Dec/21 ]

Also, Lisa is using custom identifiers and I thought I remembered Christie telling us a while ago that she had trouble matching on custom identifiers.

Comment by Ann-Marie Breaux (Inactive) [ 06/Dec/21 ]

Hi Jenn Colt Yes, I think the custom identifiers may be tripping things up, but I'm not sure.

Anne L. Highsmith Lisa Sjögren Jenn Colt Are you free this week:

  • Tues 9-10 am US ET
  • Tues 11 am-12 noon US ET
  • Tues 12 noon-1 pm US ET
  • Fri 10:30-11:30 am US ET

I'm happy to set up a call for us to look at it together and do some brainstorming

Comment by Lisa Sjögren [ 06/Dec/21 ]

Also, Lisa is using custom identifiers and I thought I remembered Christie telling us a while ago that she had trouble matching on custom identifiers.

Which identifiers are considered "custom"? All of them? In this example, as you can see from the UUIDs, I am using the ISBN and Invalid ISBN identifiers:

This passed test made me think that matching on specific identifiers like ISBN was implemented in Juniper: https://foliotest.testrail.io//index.php?/tests/view/924895.

Just so I understand – is the thing we want to brainstorming about whether the logic observed above:

  • is expected behaviour or not
  • is not expected behaviour, but low priority since identifier matching should only ever be done with a qualifier
  • is not actually the logic Data Import is using – in which case the error message (specified here in the code) provided in the log is incorrect
Comment by Anne L. Highsmith [ 06/Dec/21 ]

Ann-Marie Breaux I can make a meeting at any of the stated times.

Comment by Ann-Marie Breaux (Inactive) [ 06/Dec/21 ]

Hi Anne L. Highsmith Lisa Sjögren Jenn Colt I'll set a meeting at 9 tomorrow, and maybe get a dev to join.

I still think there's something going on with the default MARC profile, and the incoming record trying to match on an 001. 001 will never be mapped to identifier type = ISBN when parsing the incoming MARC record, nor will it be mapped to Libirs ID nor Sierra Bib ID.

I would expect an ISBN match to be attempted from an 020$a rather than an 001.

Comment by Lisa Sjögren [ 06/Dec/21 ]

I still think there's something going on with the default MARC profile, and the incoming record trying to match on an 001. 001 will never be mapped to identifier type = ISBN when parsing the incoming MARC record, nor will it be mapped to Libirs ID nor Sierra Bib ID.

Interesting – if that's the case, the below error message is saying that it found multiple records matching the condition identifier X is completely misleading? (ping Michelle Suranofsky who found it in the logs)

ERROR AbstractLoader   	Found multiple records matching specified conditions. CQL query: [identifiers=""\""identifierTypeId\"":\""28c170c6-3194-4cff-bfb2-ee9525205cf7\"""" AND (identifiers=""\""value\"":\""18124354\"""")]."
2021-11-19 12:50:06.375,Found records: [ {
Comment by Theodor Tolstoy (One-Group.se) [ 06/Dec/21 ]

J ust to avoid any misunderstandings, what I understand the Issue description and comments to contain is, :

  1. An example of how this issue first was discovered 
  2. An example in bugfest, how Lisa reproduced the Chalmers issue using this matching profile matching on 001:s and Libris Id:s
  3. A more generic example also verified on bugfest, using ISBNs and Invalid ISBN Indentifier types to validate it (this profile) And show that it is a general issue. As I understand it, the 001:s has nothing to do with this last case, but it would be interesting to hear, if 001 matching comes in to play somehow. 

 

Comment by Lisa Sjögren [ 06/Dec/21 ]

I would expect an ISBN match to be attempted from an 020$a rather than an 001.

Maybe – alas I had no better luck when using 020$ as my match point in the incoming MARC record.

See

which failed when there was one instance with ISBN 888555 and another instance with Invalid ISBN 888555, but was successful after I changed the Invalid ISBN of the second to 888555b.

Comment by Lisa Sjögren [ 06/Dec/21 ]

In any case. I won't be able to attend the meeting tomorrow, but I really don't think there is anything else I can contribute except these examples already given. I apologize if they are unclear, or if I have misunderstood something fundamental about how Data Import works.

Comment by Anne L. Highsmith [ 06/Dec/21 ]

Hi, Ann-Marie Breaux Lisa Sjögren Jenn Colt Theodor Tolstoy (One-Group.se)

identifier_mismatch_summary.xlsx

The attached "summary" spreadsheet records several testing scenarios that I have gone through today in our Juniper HF3 local instance. Hope the scenarios are clear. I tested primarily with OCLC numbers in 035 fields, so when it says "qualifier" I mean the '(OCoLC)' qualifier and prefix means an 'oxx' or alphabetic prefix before the number, not enclosed in '()'.

I have replicated Lisa's experience – while trying to match an 035 on 035, I unintentionally matched on 024 with identical numeric value. Please see the 2 lines in the spreadsheet that have the red background. I got either no match because the system thought there were duplicate numbers or an incorrect match and the wrong record overlaid.

Long story short – Any attempt to match against an identifier will succeed if there is an identifier of ANY type with an identical value; the matching identifier doesn't have to be of the same identifier type, because as noted above, even if the match rule specifies a match with a specific identifier type as well as an exact identifier match, the matching values don't have to be in the same field to create a match as long as there is a matching identifier type elsewhere in the record.

Consequently, as Data Import is currently working, my instructions to our cat. and acq staff will be:

  1. NEVER try to match on a field that routinely contains a simple integer because of the high possibility of false matches. This means ISBN (valid or invalid), 024, 035 non-OCLC, etc.
  2. NEVER use the "Only compare part of the value" match option.
  3. Always  use "Use a qualifier" option for the incoming record at least.

I'd say this issue needs a quick fix.

Comment by Khamidulla Abdulkhakimov [ 13/Dec/21 ]

Hello Lisa Sjögren. Could you attach a "crossmatching-id-123abc.mrc" file to reproduce the bug?

Comment by Ann-Marie Breaux (Inactive) [ 13/Dec/21 ]

OK'd by PTF for Kiwi BF; create separate ticket for Lotus; this will probably move to MODDICORE project, and we'll add a MODINV issue to bump its dependency on MODDICORE

Comment by Lisa Sjögren [ 14/Dec/21 ]

Khamidulla Abdulkhakimov The .mrc files I used will not be useful any more as the records were overlaid during testing.

You could use any MARC record that meets minimum Dara Import MARC validation standard and the criteria detailed in the issue description. Eg 020 value matches ISBN in one instance and ISN in another.

Comment by Ann-Marie Breaux (Inactive) [ 14/Dec/21 ]

Hi Khamidulla Abdulkhakimov I'll attach a file and lay out some steps in the description.

Comment by Jenn Colt [ 14/Dec/21 ]

Ann-Marie Breaux is there an issue for getting the "Found multiple records matching specified conditions" error into the UI rather than just the app logs? I couldn't find one.

Comment by Khamidulla Abdulkhakimov [ 14/Dec/21 ]

Hello Ann-Marie Breaux. Thank you.

Hello Jenn Colt. No, unfortunately, only in the backend logs.

Comment by Ann-Marie Breaux (Inactive) [ 14/Dec/21 ]

Hi Jenn Colt I can add one. This falls into the category of "Import stopped because the job profile told me to"

Would we also want similar UI log messages for:

  • In the case of a match, take no action
  • In the case of a non-match, take no action

These are a little different, in that they are more explicitly covered by the job profile outline, but still result in the end of the line for that incoming record, with no create or update having happened.

Comment by Jenn Colt [ 14/Dec/21 ]

Yeah I think if we could surface all of those that would be amazing. Right now I take the match point from the imported record and search for it in the UI to see why an update I expected didn't happen. It would be great to not have to do that and it would also set us up to create the kind of "summary reports" of imports that we've talked about before.

Comment by Ann-Marie Breaux (Inactive) [ 15/Dec/21 ]

Khamidulla Abdulkhakimov and Former user Almost 3 hours in to documenting this one, straightforward test (which has 5 parts and 8 different outcomes). I've described all of the profiles and created them on folio-snapshot. Last step is to create the related MARC files and describe what you should see after each import. I did not create the profiles on kiwi-bugfest, but I'll describe what should happen without the match corrections in place, based on what was reported in the bug. You may want to review all on folio-snapshot before it refreshes Weds night

Comment by Khamidulla Abdulkhakimov [ 15/Dec/21 ]

Hello Ann-Marie Breaux. Thank you for the full description of the testing process. I'll have time to test it.

Comment by Ann-Marie Breaux (Inactive) [ 15/Dec/21 ]

Hi Khamidulla Abdulkhakimov Update2 just got stuck on folio-snapshot. Working on 3 right now. My goal is to finish Updates 3 and 4, and be asleep in 20 minutes!

No changes have been merged to folio-snapshot yet - is that correct? That would explain why they are hanging or not matching properly

Comment by Khamidulla Abdulkhakimov [ 15/Dec/21 ]

Ann-Marie Breaux. No there have been no changes yet. Please attach mrc file. I should test in local machine and create pull request. Thank you.

Comment by Ann-Marie Breaux (Inactive) [ 15/Dec/21 ]

Hi Khamidulla Abdulkhakimov Done! I that's the longest set of steps I've ever written! I'll attach the MARC files. There's actually 4 of them: 1 to create, and 4 with different updates. I change a note in the MARC 500 field so that it's easy to tell if the instance was updated by a particular Update job or not.

And I strongly suggest taking a look at the profiles on folio-snapshot before (or when) you are creating profiles for your local env.

Comment by Khamidulla Abdulkhakimov [ 15/Dec/21 ]

Ann-Marie Breaux Thank you very much. One of the most detailed test cases. Today I will try to merge

Comment by Khamidulla Abdulkhakimov [ 16/Dec/21 ]

Hello Ann-Marie Breaux. I created all the necessary profiles in snapshot-load and tested 5 cases after fix:

1) ID Match Test - Update1 (Valid UPC):

  • 1st record - Instance has been successfully updated.
  • 2nd record - No changes to Instance**
  • status - Completed
  • Test case passed

2) ID Match Test - Update2 (Invalid UPC):

  • 1st record - No changes to Instance.
  • 2nd record - No changes to Instance
  • status - Completed
  • Test case failed

3) ID Match Test - Update3 (OCLC):

  • 1st record - Instance has been successfully updated.
  • 2nd record - No changes to Instance
  • status - Completed
  • Test case passed

4) ID Match Test - Update4 (System control number): 

  • 1st record -No changes to Instance.
  • 2nd record - No changes to Instance.
  • status - Completed with errors
  • Test case failed

5) Lisa tests identifier match:

  • Instance has been successfully updated.
  • status - Completed
  • Test case passed

This change ensures that the first two requirements work.

Comment by Kateryna Senchenko [ 16/Dec/21 ]

Ann-Marie Breaux, can we proceed with the release of existing fix and work on 2 and 4 scenarios in Lotus?

Comment by Ann-Marie Breaux (Inactive) [ 16/Dec/21 ]

Hi Kateryna Senchenko and Khamidulla Abdulkhakimov Do we know why test cases 2 and 4 failed? If it's related to the qualifier or matching only part of the value, it's OK to fix those parts in Lotus. If it's related to identifier type, we need to fix that part for Kiwi.

Comment by Jenn Colt [ 16/Dec/21 ]

Does case 4 failing mean no system control number update matches work? That is what is happening to me. That would be a major regression.

Comment by Khamidulla Abdulkhakimov [ 17/Dec/21 ]

Hello Ann-Marie Breaux and Jenn Colt. Status of the 4th case regarding with qualifier and compare part. Exactly matches without qualifier and compare part for identifiers works good.

Comment by Jenn Colt [ 17/Dec/21 ]

I have tried yesterday and today to get 035$a to system control number matches to work on snapshot-load and they don't, they end with "completed with errors". Is it possible to reload your test profile that shows this works? I wish we had our shared rancher!

My profile is here: https://folio-snapshot-load.dev.folio.org/settings/data-import/job-profiles/view/68780040-cacb-490d-bc80-1a3edb1a04ed?sort=name

Comment by Jenn Colt [ 17/Dec/21 ]

Ann-Marie Breaux reading back over the comments you mention that it is okay for the qualifiers to not work, but as I said above, I use that in many of my profiles. I'm not clear why it is okay to break it now.

Comment by Ann-Marie Breaux (Inactive) [ 17/Dec/21 ]

Hi Jenn Colt No, I didn't mean to say that it was OK for the qualifiers not to work, but I thought that the changes we made had not touched the existing functionality for qualifiers and only comparing part of the value. I thought whatever was working on was not broken with the identifier type changes.

Update from this morning: Khamidulla Abdulkhakimov has found the problem and is working on it right now. We're planning to have the PR by Monday, and hopefully get it onto snapshot-load some time Monday.

Jenn Colt I'm off all day today, and busy all day Saturday, but will check in Sunday afternoon to review any additional comments

Comment by Khamidulla Abdulkhakimov [ 17/Dec/21 ]

Jenn Colt thank you for comment, I've found new bug regarding with subfields parsing in the new code. Now I'm working on this bug and I hope to create pull request on Monday.

Comment by Jenn Colt [ 17/Dec/21 ]

Thank you for the updates! Have a good weekend.

Comment by Ann-Marie Breaux (Inactive) [ 20/Dec/21 ]

Hi Khamidulla Abdulkhakimov Just checking in - where are you with the next PR - is it ready to merge yet? Still In code review? Once it's merged, let's ask for folio-snapshot-load to be rebuilt, so that we can start testing it there ASAP on Monday. Then if all goes well, we'll try to release the Kiwi bugfix early Tuesday (your time), so that it can be tested on Kiwi Bugfest during the day on Tuesday (my time). And then hopefully that'll mean the Kiwi release is official on Weds. Thank you!

cc: Kateryna Senchenko Oleksii Petrenko Taisiya Trunova

Comment by Khamidulla Abdulkhakimov [ 20/Dec/21 ]

Hello Ann-Marie Breaux. I've merged the PR. The changes will be available for testing within an hour.

Comment by Ann-Marie Breaux (Inactive) [ 20/Dec/21 ]

Thanks very much, Khamidulla Abdulkhakimov. I'm out today, but I'll ask other SMEs to test and report back. Fingers crossed!

Also, could you confirm whether Update2 and/or Update4 passed in your local environment, after the fix you made?

Comment by Khamidulla Abdulkhakimov [ 20/Dec/21 ]

Ann-Marie Breaux. All test cases ended with the Completed status. There was no match in the second test case. While it is not possible to test on snapshot-load, I still expect a response in slack. When the environment is ready for testing, I will left comment here. Thanks.

Comment by Ann-Marie Breaux (Inactive) [ 20/Dec/21 ]

Hi Khamidulla Abdulkhakimov Wait, now I'm confused again. Update2 should have resulted in a match. There's no qualifiers or partial matches in that scenario, just an exact match on an Invalid UPC, which should have triggered an update to the second title, but not the first title.

And in Update4, I think we should have had an update, since "Qualifier" was previously working.

Jenn Colt could you try one of your standard qualifier scenarios, once snapshot-load is updated today, and see what happens?

Comment by Khamidulla Abdulkhakimov [ 20/Dec/21 ]

Ann-Marie Breaux. I couldn't find the value of sub field "z" in the second mrc file. And in the second case, I think there should be no matching. Maybe second match profile should have sub field "a" instead of "z" here?

Comment by Oleksii Petrenko [ 20/Dec/21 ]

CAP planning meeting decision - Juniper release - Defer including to Juniper to next week. Need to gather volunteers to test. Former user Draw up impact analysis to proceed regression testing. Note all testing in TestRail.

Comment by Ann-Marie Breaux (Inactive) [ 20/Dec/21 ]

Thanks, Oleksii Petrenko Will keep everyone updated.

Comment by Khamidulla Abdulkhakimov [ 20/Dec/21 ]

folio-snapshot-load is ready for testing.

Comment by Ann-Marie Breaux (Inactive) [ 20/Dec/21 ]

Khamidulla Abdulkhakimov Here's the first record in Update2 file. No $z, so it should not match to Invalid UPC.

But here's the second one, which does have 024 $z, which should be mapping to Invalid UPC when the original file is imported, and then matching when Update2 is imported.

Comment by Khamidulla Abdulkhakimov [ 20/Dec/21 ]

Ann-Marie Breaux We use the 035 field for the second match profile. Maybe I misunderstood?

 

Comment by Khamidulla Abdulkhakimov [ 20/Dec/21 ]

I've created necessary profiles in folio-snapshot-load environment and tested all 4 test cases. All tests passed succesfully and have "Completed" status. Regarding the second test case, I changed the 035 field to 024 in the match profile.

Comment by Ann-Marie Breaux (Inactive) [ 21/Dec/21 ]

Hi Khamidulla Abdulkhakimov Yep, the 024/035 mixup in case 2 was my mistake. Too much copy/paste. I'll adjust in the description

Comment by Ann-Marie Breaux (Inactive) [ 22/Dec/21 ]

Testing on Kiwi Bugfest - switched 024 from UPC/Invalid UPC to ISMN/Invalid ISMN, because no Identifier type of Invalid UPC on Kiwi Bugfest. Setting up all profiles with names that begin with MODDICORE-231 ID Match Test...

  • Searched Instances to see if any of the referenced identifiers were in Inventory; they were not
  • Imported the file to create 2 instances; re-searched for the referenced identifiers. They created identifiers with the appropriate Identifier types
  • Update1: Match on ISMN - worked properly
    • 1st record matched and updated
    • 2nd record (with same identifier, but identifier type Invalid ISMN) did not match or update
  • Update2: Match on Invalid ISMN - worked properly
    • 1st record: did not match or update
    • 2nd record: matched and updated
  • Update3: Match on 035 but only if it begins with (OCoLC), numerics only
    • 1st record: Matched and updated
    • 2nd record: did not match or update
  • Update4: Match on 035 with qualifier (AMB), exact match
    • 1st record: did not match or update
    • 2nd record: matched and updated

Thus:

  • Confirmed that same value but different identifier type is matching properly now, so that there is only a single match (and corresponding update) or no match (and thus no update)
  • Qualifier is working
  • Numeric only is working

Closing this Jira

Comment by Oleksandr Bashtynskyi [ 22/Dec/21 ]

Test cases were created:

https://foliotest.testrail.io/index.php?/cases/view/347828
https://foliotest.testrail.io/index.php?/cases/view/347829
https://foliotest.testrail.io/index.php?/cases/view/347830
https://foliotest.testrail.io/index.php?/cases/view/347831

Ann-Marie Breaux

Comment by Ann-Marie Breaux (Inactive) [ 22/Dec/21 ]

Thanks very much, Former user I'll take a look at the test cases and update if necessary.

Khamidulla Abdulkhakimov Tested on kiwi-bugfest, and all worked exactly as expected. I'll ask the other SMEs to test when they have a chance, but all is looking good! Thank you very much.

Comment by Ann-Marie Breaux (Inactive) [ 22/Dec/21 ]

Hi Khamidulla Abdulkhakimov and Kateryna Senchenko I'm cleaning up this Jira - could you assign points to it? Thank you!

Comment by Ann-Marie Breaux (Inactive) [ 05/Jan/22 ]

Reviewed with SMEs at DI Subgroup meeting; some testing done before and after the holidays. No issues detected. Consider this closed and MODDICORE-233 Closed not needed. Will add separate bug if any additional problem discovered.

  • Any feedback on testing, either in Kiwi Bugfest or local envs?
  • Jenn: tested before holidays; working well enough
  • Lisa: tested on Kiwi BF: 035 identifier as OCLC number, with numerics only; worked fine (see also MODDICORE-158 Closed )
  • Jennifer: tested before holidays: 020 and OCLC number
Comment by Ann-Marie Breaux (Inactive) [ 09/Dec/22 ]

Hi Kateryna Senchenko I'm cleaning out some old labels on Jiras. This one has "needs-karate" in it. Could you review? If still needs Karate, please cerate a new task to cover that. If it doesn't need Karate, please remove the label. Thank you!

Comment by Kateryna Senchenko [ 09/Dec/22 ]

Hi Ann-Marie Breaux, Karate test was added in scope of FAT-1474 Closed . Removing the label. Thank you!

Generated at Thu Feb 08 22:22:10 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.