[MODINVSTOR-925] Remove redundant inventory-storage GIN and full-text indexes Created: 13/Jan/22  Updated: 06/Jul/22  Resolved: 05/Jul/22

Status: Closed
Project: mod-inventory-storage
Components: None
Affects versions: None
Fix versions: 24.0.0

Type: Story Priority: P3
Reporter: Mikhail Fokanov Assignee: Mikhail Fokanov
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: File Testing Inv Sing Rec Import.mp4    
Issue links:
Relates
relates to ZF-62 Perform principal search in mod-searc... Closed
relates to MODINVSTOR-927 InstallUpgradeIT Closed
relates to MODDATAIMP-697 Test performance of update scenarios ... Closed
relates to PERF-238 Test speed of migration with and with... Closed
relates to MODDATAIMP-679 Check possibility to remove redundant... Closed
relates to MODORDERS-680 Switch inventory full text search que... Closed
Sprint: CP: sprint 143, CP: sprint 142
Story Points: 2
Development Team: Core: Platform
Release: Morning Glory (R2 2022)

 Description   

Purpose/Overview:

The migration process (updating all records in instances table) takes 8+ hours. Without indexes and triggers it takes ~ 30 mins.

Requirements/Scope:

  1. Investigation should be done in order to find out what modules uses this indexes.
  2. The redundant indexes should be removed from redundant schema json.

The following indexes are supposed to be removed:

For Instance:

  • title
  • indexTitle
  • contributors
  • subjects
  • classifications
  • keyword
  • allTitles
  • contributorsNames
  • languages
  • dateOfPublication

NB: The identifiers full-text index is retained. It means that ISBN and normalized ISBN search in inventory will work.
These exact match indexes to be used with == CQL operator are retained: title, indexTitle.

For holdings:

  • electronicAccess
  • callNumberAndSuffixNormalized
  • fullCallNumberNormalized

For item:

  • status.name
  • callNumberAndSuffixNormalized
  • fullCallNumberNormalized


 Comments   
Comment by Marc Johnson [ 07/Mar/22 ]

Mikhail Fokanov Given this work involves changes in mod-inventory-storage, should this issue be moved to the mod-inventory-storage project?

Comment by Mikhail Fokanov [ 09/Mar/22 ]

Actually, it is related to the whole Folio project, as this change affects other modules (e.g. mod-orders). Could we retain it in Folio project?

Comment by Marc Johnson [ 09/Mar/22 ]

Mikhail Fokanov

Actually, it is related to the whole Folio project, as this change affects other modules (e.g. mod-orders). Could we retain it in Folio project?

Sure, keeping this issue is totally ok.

Officially, work related to individually modules is intended to be related to JIRA issues in the specific projects for that module. I think that is especially important in this case due to how significant the impact of these changes could be (and that traceability could be important).

Comment by Charlotte Whitt [ 16/Mar/22 ]

Mikhail Fokanov, has there been a thorough analysis of what this change will cause of problems for existing apps and module? E.g. Z39.50, MARC Authority app, mod-inventory-update and more? Just asking, while all of this is absolutely new to me

CC: Jakub Skoczen Adam Dickmeiss Mike Taylor Niels Erik Nielsen Wayne Schneider Mike Gorrell

Comment by Mike Taylor [ 16/Mar/22 ]

Removing all indexes absolutely will break the Z39.50 server. At minimum it needs the holdings-by-instance-ID and items-by-holding-ID indexes, plus whatever we need for searching in instances – author, title and ISBN are the obvious ones.

Comment by Marc Johnson [ 16/Mar/22 ]

Charlotte Whitt

has there been a thorough analysis of what this change will cause of problems for existing apps and module?

I asked a similar question in the pull request for this work. Here is Mikhail Fokanov's answer.

Comment by Mikhail Fokanov [ 16/Mar/22 ]

Removing all indexes absolutely will break the Z39.50 server.

The removal applies only for full-text indexes in inventory-storage. All other indexes won't be affected, Full-text search in modules is supposed to be migrated to mod-search.

Now there is a duplication of full-text indexes in Postgres and Elasticsearch. According to performance tests the removal of redundant full-text indexes will speed-up the release process 2-4x times (depending on the involved migration).

Mike Taylor Could you please provide more information, how the Z39.50 server is used? Will it be ok to switch Z39.50 server from mod-inventory-storage to mod-search?

Comment by Mikhail Fokanov [ 16/Mar/22 ]

The following indexes are supposed to be removed:

For Instance:

  • title
  • indexTitle
  • contributors
  • subjects
  • classifications
  • keyword
  • allTitles
  • contributorsNames
  • languages
  • dateOfPublication

NB: The identifiers full-text index is retained. It means that ISBN and normalized ISBN search in inventory will work.

For holdings_record:

  • electronicAccess
  • callNumberAndSuffixNormalized
  • fullCallNumberNormalized

For item:

  • status.name
  • callNumberAndSuffixNormalized
  • fullCallNumberNormalized

The changes has been discussed with Core-Platform (Jakub Skoczen, Julian Ladisch ).

All these indexes were used by inventory UI, which now uses mod-search.  I ran karate tests for all modules on rancher environment and got RMB warnings that GIN indexes are absent only for mod-orders tests. We are working on mod-orders in order to get rid of this usage.

 

Comment by Mikhail Fokanov [ 16/Mar/22 ]

Charlotte Whitt This changes are for Morning Glory.

I cannot find usage of these indexes in mod-inventory-update. Who can I reach out to about it?

As far as I know MARC Authority search is implemented using mod-search in Lotus. Magda Zacharska could you please clarify if it is the same Authority search, which Falcon team has implemented?

Comment by Mikhail Fokanov [ 16/Mar/22 ]

Charlotte Whitt Most of my concerns are related to circulation flow. Who can provide information whether these indexes are used in it?

Comment by Jenn Colt [ 16/Mar/22 ]

Have data import concerns also been covered? For instance, both electronic access and item status are available as match points in DI. Just wondering Ann-Marie Breaux

Comment by Marc Johnson [ 16/Mar/22 ]

The removal applies only for full-text indexes in inventory-storage. All other indexes won't be affected, Full-text search in modules is supposed to be migrated to mod-search.

The pull request connected to this issue includes the removal of gin indexes as well as full text indexes.

Does that mean you consider gin indexes to be full text indexes?

Now there is a duplication of full-text indexes in Postgres and Elasticsearch.

Unfortunately, due to the design choice made to define the ES based search external to inventory itself, there remains two separate ways to search inventory data, that both need to be supported.

I cannot find usage of these indexes in mod-inventory-update. Who can I reach out to about it?

In the original proposal document for introducing Elastic Search it stated that Elastic Search, and therefore mod-search queries should not be used for transactional logic that changes state.

No transactional calls should be performed against search microservice as Elasticsearch is not ACID and have eventual consistency.

mod-inventory-update is module concerned with processing changes to inventory records. My interpretation of your design document is that this would mean it is inappropriate to use mod-search.

How wrong is my interpretation?

cc: Mike Taylor Charlotte Whitt

Comment by Mikhail Fokanov [ 16/Mar/22 ]

Jenn Colt, thanks for highlighting it. Does data-import have karate tests? I will reach out to Folijet team and clarify the use case to find out whether the GIN indexes can be replaced with btree indexes.

Comment by Mikhail Fokanov [ 16/Mar/22 ]

Marc Johnson you are right regarding the data source for matching.

The purpose of the Jira is to remove only redundant GIN indexes, that were used only from inventory UI.

I do understand that no one can say what indexes are used in what modules. But at the same time GIN indexes are considered as slow for insertion and quick for querying (GiST is vice-versa). B-tree indexes are standard postgres indexes for exact matching and left anchor search, which should be used in most of the cases.

So if we retained GIN (full-text) indexes "just in case some module could use it" it would be inefficient use of database resources. Moreover it causes essential problems for all releases, that implies data migration in inventory. So my goal is to find out what GIN indexes are really used by modules and cannot be replaced with b-tree indexes. If I find out that all indexes are used in modules, I will close my PR.

Comment by Marc Johnson [ 17/Mar/22 ]

Mikhail Fokanov

you are right regarding the data source for matching.

Ok.

The purpose of the Jira is to remove only redundant GIN indexes, that were used only from inventory UI.

I understand the scope of the issue. I wasn't challenging it.

My question was purely about the types of indexes. Your comment above stated full-text, the title states GIN (which I didn't think were full text) and the pull request includes removing both.

I do understand that no one can say what indexes are used in what modules.

Indeed, this is the fundamental challenge with this work. The index not being used by queries previously made by UI inventory says nothing about whether they are used by queries elsewhere.

FOLIO's design choices (open query language) and philosophy (community extension) makes this question basically impossible to answer.

But at the same time GIN indexes are considered as slow for insertion and quick for querying (GiST is vice-versa). B-tree indexes are standard postgres indexes for exact matching and left anchor search, which should be used in most of the cases. So if we retained GIN (full-text) indexes "just in case some module could use it" it would be inefficient use of database resources. Moreover it causes essential problems for all releases, that implies data migration in inventory. So my goal is to find out what GIN indexes are really used by modules and cannot be replaced with b-tree indexes. If I find out that all indexes are used in modules, I will close my PR.

I'm not arguing against that in the slightest.

I'm merely trying to raise awareness of these changes amongst those who could well be affected and that the Karate tests likely won't surface.

I do think that the Core Platform should take responsibility for bugs reported following these changes if folks encounter decreased performance in queries (that were using those indexes).

Comment by Charlotte Whitt [ 17/Mar/22 ]

Mikhail Fokanov - re. dependencies with

  1. mod-inventory-update, then please reach out to Niels Erik Nielsen
  2. mod-circulation, then please reach out to Stephanie Buck and Alexander Kurash
    (https://folio-org.atlassian.net/wiki/display/REL/Team+vs+module+responsibility+matrix)
Comment by Charlotte Whitt [ 17/Mar/22 ]

Mikhail Fokanov re.:


As far as I know MARC Authority search is implemented using mod-search in Lotus. Magda Zacharska could you please clarify if it is the same Authority search, which Falcon team has implemented?

Then that's correct. So MARC Authority is probably not affected by this change.

Comment by Charlotte Whitt [ 17/Mar/22 ]

Mikhail Fokanov - what about the Request app? That should be investigated as well. Please reach out to Brooks Travis

Comment by Jakub Skoczen [ 17/Mar/22 ]

Mikhail Fokanov Marc Johnson It might be a good idea to mark this as a breaking change in the inventory interface versions even though it strictly isn't breaking. Many queries will have a severely limited performance to the point where they might be considered "broken".

Once this code is merged the plan was for Mikhail Fokanov to perform extensive integration testing. Given low coverage in our Karate tests, it might be a good idea to for you to reach out to Anton Emelianov and try to arrange for a smoke testing session that includes community members, if possible.

The Core Platform will certainly work to help individual teams to help resolve specific module issues that may arise from this change.

Comment by Marc Johnson [ 17/Mar/22 ]

Jakub Skoczen

It might be a good idea to mark this as a breaking change in the inventory interface versions even though it strictly isn't breaking. Many queries will have a severely limited performance to the point where they might be considered "broken".

That would provide an indicator to folks, which could raise awareness.

I don't think FOLIO has decided on a better mechanism for communicating these kinds of changes.

I think it definitely should be included in the Morning Glory release notes.

(I personally disagree with interface versions being used in this way, but FOLIO has a vague definition of what an interface is, so it's not valuable to discuss this)

Once this code is merged the plan was for Mikhail Fokanov to perform extensive integration testing.

I think that is a good idea.

For that to be effective, there needs to be a sense of what parts of the system could be impacted. My intention for bringing up these questions was for folks to talk about that.

Where is that testing going to be performed (given that index use and observable performance impacts are heavily dependent upon the data in the system)?

Given low coverage in our Karate tests, it might be a good idea to for you to reach out to Anton Emelianov and try to arrange for a smoke testing session that includes community members, if possible.

I think that's a good idea.

The Core Platform will certainly work to help individual teams to help resolve specific module issues that may arise from this change.

Thank you, I appreciate the Core Platform team owning the impact of this.

Comment by Ann-Marie Breaux (Inactive) [ 17/Mar/22 ]

Pinging Kateryna Senchenko so that she can review in relation to any possible effects on the calls that Data Import makes to Inventory. (Also do we need to ping anyone from Spitfire for possible impact on quickMARC?) cc: Khalilah Gambrell

Comment by Mike Taylor [ 17/Mar/22 ]

Mike Taylor Could you please provide more information, how the Z39.50 server is used? Will it be ok to switch Z39.50 server from mod-inventory-storage to mod-search?

NB: The identifiers full-text index is retained. It means that ISBN and normalized ISBN search in inventory will work.

It's complicated.

In response to each client search, Net::Z3950::FOLIO does two different backend searches, but one of those itself involves two kinds of search behind the scenes. Here is the sequence:

  1. It accepts the client's query, which is typically a Z39.50 Type-1 ("RPN") query such as @and attr 1=4 programming @attr 1=1 kernighan
  2. It translates this into the CQL format that FOLIO modules understand, such as title=programming and contributors/@name=kerninghan
  3. That query is sent to mod-graphql with instructions to forward it to mod-inventory-storage – specifically, to /instance-storage/instances – and to expand the results to include holdings and items.
  4. mod-inventory-storage finds the relevant set of instances, obviously using the relevant indexes.
  5. For each instance in the set, mod-graphql searches for holdings that have their instanceId set to the instance's ID. (That's what the definition of the holdingsRecords2 virtual-field is for, and DO NOT ask why there is a "2" in the field name.)
  6. For each holdings record in the set, mod-graphql searches for items that have their holdingsRecordId set to the holdings record's ID. (That's what the definition of the holdingsItems virtual-field is for.)
  7. Net::Z3950::FOLIO gets back this list of enriched instance-plus-holdings-and-items records, and uses them to assemble special MARC fields describing the holdings and items. Ah, but what does it do with those fields? It has no MARC record to add them to! Hence:
  8. Net::Z3950::FOLIO posts a list of instance IDs to mod-source-record-storage – specifically to /source-storage/source-records?idType=INSTANCE to obtain the underlying MARC records.
  9. And then of course it inserts the holdings-and-items fields that it generated in Step 7 into the MARC records that it got in step 8, and returns the resulting MARC records to the client.

Searches happen in steps 4, 5, 6 and 8.

  • I would like to think that whatever is being done to mod-inventory-storage will allow steps 5 and 6 to continue working, since they only require indexes on identifiers. Can someone please verify this?
  • I assume that step 8 will not be affected, since the present work doesn't touch mod-source-record-storage – correct?
  • Step 4 is the difficult part, and I will address what might be done in my next comment.
Comment by Mike Taylor [ 17/Mar/22 ]

All right, then, so: step 4: mod-inventory-storage finds the relevant set of instances, obviously using the relevant indexes. That is what the present issue plans to break (and if my interpretation above is correct, the only thing it plans to break).

At present, installations of the FOLIO Z39.50 server are typically configured to be able to search on author, title, all-titles, subject, publication date, HRID, ISBN, ISSN, OCLC number, source, item-barcode, and "keyword" (which I think combines author, title and subject). I assume all these got into the srandard configuration because customers wanted them, so we will need to continue supporting all these searches.

So the question becomes: can mod-search do all these searches, and combine them with boolean operators, to return a list of (at least) instance IDs? If so, then shifting across to use mod-search instead of mod-inventory-storage should be relatively straightforward. It's not impossible that it could be literally a matter of changing a single line of code.

But if mod-search is unable to support some of those searches than we have a problem. Can someone who is familiar with mod-search comment?

Comment by Jakub Skoczen [ 18/Mar/22 ]

Mike Taylor Mikhail Fokanov AFAIK, steps 5 and 6 should be good as long as you are using `==` in the CQL for lookups since this uses the BTREE index behind the scenes. Mike Taylor please confirm this in the mod-z3950 source and Mikhail Fokanov  please confirm that the indexes are specified in `schema.json` in mod-inventory-storage. Step 8 is not affected.

Regarding step 4: mod-search supports CQL but this CQL is not one-to-one replacement for mod-inventory-storage one so Mikhail Fokanov needs to provide the details.

Also, regarding steps 5 and 6: while you could leave them as is (after confirming what I wrote above), mod-search is already doing the work of combining instance, holdings and items records for indexing. So it would still make sense to rewire those steps against mod-search, if possible – this would alleviate the need for GraphQL and should be orders of magnitude faster as it doesn't require any DB joins. Mikhail Fokanov would need to provide details.

Comment by Mike Taylor [ 18/Mar/22 ]

Thanks, Jakub Skoczen. Yes, I can confirm that
mod-graphql uses the == operator for links.

I will await Mikhail Fokanov's description of the CQL support in mod-search – or, better, a link to existing documentation of what searches are supported.

And interesting that the records returned from mod-search may already contain the necessary holdings and items. We should look into this potential optimisation as its own issue.

Comment by Charlotte Whitt [ 18/Mar/22 ]

Hi Mike Taylor: here

  1. GitHub documentation for searches supported in mod-search: https://github.com/folio-org/mod-search
  2. Wiki information on what search options being implemented: https://folio-org.atlassian.net/wiki/display/MM/Elasticsearch+Supported+Search+Options%2C+Filters%2C+Facets+and+Sorting

Is that maybe the documentation you are looking for?

Comment by Mike Taylor [ 18/Mar/22 ]

One observation: there is a new parameter expandAll, defaulting to false, meaning that by default only "basic instance properties returned" – id, title, contributors, publication. That means we will need mod-graphql to include expandAll=true (or perhaps just expandAll) in its request to mod-search, which in turn means we will need to invent a way for the mod-graphql client to ask for this.

Comment by Mike Taylor [ 18/Mar/22 ]

In mod-search's CQL implementation, title="semantic web" means to find records that contain both words "semantic" and "web" – you need to use title=="semantic web" to do a phrase search. This is incorrect CQL. The former means phrase search, and the latter means that the entire value of the field is as given. Is it too late to fix mod-search to conform to standard CQL semantics?

Comment by Mike Taylor [ 18/Mar/22 ]

According to https://github.com/folio-org/mod-search#instance-search-options, support for the instance searches that the Z39.50 server uses is as follows:

  • author – maybe OK, if this is what contributors.primary means.
  • title – oddly, does not seem to exist. There is a CQL index called title, but it searches against many title-related fields, not just actual title.
  • all-titles – OK, more or less: the CQL index title searches for title, alternativeTitle, indexTitle and series field, but not (according to the documentation) uniformTitle.
  • subject – OK (though note that it is named subjects, plural)
  • publication date – does not seem to be supported, surprisingly
  • HRID – OK
  • ISBN – OK
  • ISSN – OK
  • OCLC number – does not seem to be supported
  • source – OK
  • item-barcode – maybe OK, if this is what items.barcode means
  • keyword – OK

All in all, this looks much better than I had feared. The issues are:

  • Does contributors.primary mean what people expect by an author search?
  • No ways to search for just the title, but I wonder if people ever really want that?
  • All-titles searching omits uniformTItle, but I don't know if that matters much.
  • No way to search for publication date!
  • No way to search for OCLC number
    And the critical one:
  • Does a search for items.barcode find (as we hope) instances that have holdings that have items with that barcode? (If it finds items instead of instances, we have a problem).

Finally, may I say what a please it was to read such comprehensive documentation.

Comment by Mikhail Fokanov [ 20/Mar/22 ]

Mike Taylor I appreciate your review of mod-search documentation. In addition to good documentation, mod-search has 100% unit test code coverage and great code style (personally, I enjoy reading the code of it).

In mod-seach the similar approach to RMB's schema.json is implemented for storing and using of metadata of the Instance and other searchable entities. However it has significant differences in semantics, because Elasticsearch indexes are different from Postgres ones.

For Instance entity the metadata is stored in instance.json. Both publication date and call number exist in it and therefore are supported. See: https://github.com/folio-org/mod-search/blob/master/src/main/resources/model/instance.json#L141 

 

Pavel Filippov, mod-search documentation could be even more comprehensive , please check it to find out if there are other supported fields that are missing in the documentation.

Does contributors.primary mean what people expect by an author search?

For this type of search you should use CQL like: contributors all "John" and contributors.primary==true . My gut feeling is that it is a search for authorMagda Zacharska please confirm if my understanding is correct.

No ways to search for just the title, but I wonder if people ever really want that?

As you can see in https://github.com/folio-org/mod-search/blob/master/src/main/resources/model/instance.json#L33 there are different types of titles in schema.json. You can search by them separately or use so-called "search alias" "title" to search over all of them.

Does a search for items.barcode find (as we hope) instances that have holdings that have items with that barcode? (If it finds items instead of instances, we have a problem).

Searching by barcode returns instances, as all other types of search for instance related data.

 

Comment by Charlotte Whitt [ 21/Mar/22 ]

Hi Mike Taylor
Contributors primary, is what is in MARC tag 100/110/111. So only one contributor - the primary one - and only one for a given record.
Contributors, are all contributors, so the ones in MARC tag 100/110/130 and 700/710/711.

Comment by Jakub Skoczen [ 21/Mar/22 ]

Mikhail Fokanov discussed deploying his mod-inventory-storage branch on the Core Platform Rancher environment and doing more testing before merging it.

Mike Taylor would it make sense for you to create a branch of mod-z3950 compatible with mod-search and have Mikhail install it on the test environment and confirm it works?

Comment by Mike Taylor [ 21/Mar/22 ]

I enjoy Charlotte Whitt's endearing assumption that explaining it in MARC codes is going to make it clearer for me

Thanks, Mikhail Fokanov, for clarifying some bits of mod-search that I'd missed. Since `instance.json` is the Source Of Truth here, it would be helpful to link to it from the top-level README.

On title-searching: it looks like my options are title, which encompasses several kinds of title, alternativeTitles, indexTitle, sort_title and uniformTitle. I'm still not clear on which of these is the actual title, such as one would expect to find with a title search.

Good to know that publication date is indeed supported.

It does look as though OCLC Number is not, however. What would it take to add that? It's just one more indentifier, like ISBN and ISSN.

Most importantly, it's great news that searching of instances by item barcode is supported.

That leaves author searching. I have theological objections to using and in the way that's prescribed here – contributors all "John" and contributors.primary==true means to find all records that have a contributor called "John" and that have a primary contributor, so that it ought to find records where Anne was the author and John only the illustrator. It also presents a practical problem, as it means that the CQL generation in the Z39.50 server would have to do something rather peculiar here, which would mean writing new code to understand a new configuration directive. It can be done, but it's unappealing. Would it be difficult to instead enhance mod-search with a new author index that does the right thing?

Comment by Mike Taylor [ 21/Mar/22 ]

Jakub Skoczen

would it make sense for you to create a branch of mod-z3950 compatible with mod-search and have Mikhail install it on the test environment and confirm it works?

It can be done, but it's not a trivial undertaking.

Comment by Charlotte Whitt [ 21/Mar/22 ]

Hi Mike Taylor Khalilah Gambrell - re.


It does look as though OCLC Number is not, however. What would it take to add that? It's just one more indentifier, like ISBN and ISSN.

Following ticket MSEARCH-282 Closed Implement normalized search option for OCLC identifiers, is in Code review.
I believe that would solve the dedicated index for OCLC numbers.

Comment by Mikhail Fokanov [ 21/Mar/22 ]

It does look as though OCLC Number is not, however. 

You are right. It hasn't been merged yet. It is in progress: https://folio-org.atlassian.net/browse/MSEARCH-282

Comment by Magda Zacharska [ 21/Mar/22 ]

Mikhail Fokanov and Mike Taylor the search by primary contributor addresses use case when the user is looking only for instances were a given contributor is a primary contributor. For example, contributors =" Steinbeck, John, 1902-1968" AND contributors.primary = true will not match a publication where Steinbeck wrote only introduction (and thus is not a primary contributor)

Regarding the search by title only - it was never requested as far as I know.

Comment by Mike Taylor [ 22/Mar/22 ]

Magda Zacharska Yes, I understand that is how the AND operator works in this search; what I am saying is that it's not what AND means in CQL.

Comment by Aivar Iusupov [ 16/May/22 ]

Hi Mikhail Fokanov!

During MODDATAIMP-679 Closed investigation, we've found that these Indexes are in use and cannot be deleted:

Instance:

  • title - data import matching (match profile)
  • indexTitle - data import matching (match profile)
  • contributors - data import matching (match profile)
  • classifications - data import matching (match profile)

Items:

  • status.name - data import matching (match profile)
Comment by Ann-Marie Breaux (Inactive) [ 16/May/22 ]

Hi Mikhail Fokanov Could you confirm that the indexes that Aivar Iusupov added in the above comment will NOT be removed? Thank you!

Comment by Julian Ladisch [ 25/May/22 ]

Mikhail Fokanov: Please adjust this Jira's description regarding the indexes Aivar Iusupov requested to be kept.

Comment by Mikhail Fokanov [ 25/May/22 ]

Aivar Iusupov You mentioned that these indexes are used for "data import matching (match profile)". Are they used for exact matching of the strings?

Comment by Aivar Iusupov [ 27/May/22 ]

Hi Mikhail Fokanov , SQL queries contain "LIKE" even for exact matching because CQL->SQL transformation works like this.

Comment by Mike Taylor [ 27/May/22 ]

Aivar Iusupov You should get an exact-match query in SQL if your CQL query uses the == relation instead of =.

Comment by Marc Johnson [ 27/May/22 ]

You should get an exact-match query in SQL if your CQL query uses the == relation instead of =

IIRC, PostgreSQL treats a LIKE with no wildcard the same as a = (in SQL) and so at one point RMB would generate LIKE statements from == CQL clauses.

Adam Dickmeiss Julian Ladisch maybe you can check my memory?

Comment by Jakub Skoczen [ 29/May/22 ]

Marc Johnson Aivar Iusupov What Marc says above is correct – as long as there is no wildcards in the query LIKE and = are the same in Postgres and they don't need a GIN index, regular BTREE index is sufficient. But I would still recommend what Mike Taylor proposes – use == in CQL to make your intent for exact string matching clear.

Comment by Marc Johnson [ 30/May/22 ]

I would still recommend what Mike Taylor proposes – use == in CQL to make your intent for exact string matching clear.

I also support making a query as specific and intention revealing as possible.

Comment by Julian Ladisch [ 30/May/22 ]

Yes, RMB converts CQL == to SQL LIKE.

From PostgreSQL's LIKE documentation:

If pattern does not contain percent signs or underscores, then the pattern only represents the string itself; in that case LIKE acts like the equals operator.

Comment by Mikhail Fokanov [ 13/Jun/22 ]

Aivar Iusupov, please follow up regarding the usage of "==" or "=" for CQL. As discussed in this issue, you will be able to do "LIKE", the removal will affect SQL which contains tsvector and other full text staff.

Comment by Ann-Marie Breaux (Inactive) [ 15/Jun/22 ]

Hi Mikhail Fokanov and Jakub Skoczen Will this Jira be finished and released as part of Morning Glory? The release field is not filled in.

Once these changes are merged, it seems like we should have some pretty significant smoke testing for apps that interact with Inventory, to ensure that unexpected negative effects do not surface. cc: Anton Emelianov Oleksandr Bashtynskyi Khalilah Gambrell Charlotte Whitt Magda Zacharska Stephanie Buck

Kateryna Senchenko and Aivar Iusupov Please see Mikhael's comment above this one - does this change DI's need for the following indexes to be retained?

Instance:

  • title - data import matching (match profile)
  • indexTitle - data import matching (match profile)
  • contributors - data import matching (match profile)
  • classifications - data import matching (match profile)

Items:

  • status.name - data import matching (match profile)
Comment by Jakub Skoczen [ 21/Jun/22 ]

Ann-Marie Breaux please check with Oleksii Petrenko and Mikhail Fokanov regarding the status of this change and the decision on whether this will be included in MG. I've not been coordinating the rollout of this change myself.

Comment by Charlotte Whitt [ 21/Jun/22 ]

Just for the sake of 'history' here link to the Slack conversation (6/14) re. MODINVSTOR-925 Closed - https://folio-project.slack.com/archives/C031HF7RW01/p1655233225590479

Comment by Ann-Marie Breaux (Inactive) [ 21/Jun/22 ]

Somebody PLEASE decide about whether this change is rolling out in MG or not. I am not qualified to decide that.

I am changing it to a MODINVSTOR issue, so that it can include the proper fix version.

Kateryna Senchenko Aliaksandr Fedasiuk see long thread above. Any concerns about rolling this out at this point? Changes were merged into master last week.

Jakub Skoczen If you're not making the decision, then I recommend that Cap Planning Team weigh in, or else determine who makes the decision. Let's please get this figured out today or tomorrow. Mark Veksler Hkaplanian Khalilah Gambrell Mike Gorrell

Comment by Charlotte Whitt [ 21/Jun/22 ]

Ann-Marie Breaux - you were one of them whom were the most reluctant about doing this roll out for MG without more testing.
But if we (the POs) are all aligned that this work, which has been ready since February, and has already been postponed with one release, then I think that's the decision - and now is the time to get this out to our customers!

Comment by Ann-Marie Breaux (Inactive) [ 21/Jun/22 ]

Hi Charlotte Whitt I understand, and Folijet has a task to test the performance of updates with this change in place. We'll just roll with whatever decision is made, and if performance is degraded, we'll figure out a way to work on it in Nolana.

Comment by Marc Johnson [ 22/Jun/22 ]

Charlotte Whitt

if we (the POs) are all aligned that this work, which has been ready since February, and has already been postponed with one release, then I think that's the decision - and now is the time to get this out to our customers!

Are you going to work with Mike Taylor to organise the remedial work for Z39.50 before the Morning Glory release deadline?

Comment by Mike Taylor [ 22/Jun/22 ]

Marc Johnson Since I can't be confident that MG is not going to break the Z39.50 server that our customers rely on, I am already working on it. See ZF-62 Closed (linked above).

Comment by Marc Johnson [ 22/Jun/22 ]

Mike Taylor

Since I can't be confident that MG is not going to break the Z39.50 server that our customers rely on, I am already working on it.

Oh ok.

Oleksii Petrenko Charlotte Whitt Ann-Marie Breaux

Does that mean this is a done deal and the decision is made?

If so, I'll unblock the release of mod-inventory-storage for Morning Glory.

Mikhail Fokanov Please make sure that the impact of these changes are described in the Morning Glory overall release notes so that hosting providers, systems operators and implementors are made aware of this.

Comment by Mike Taylor [ 22/Jun/22 ]

Does that mean this is a done deal and the decision is made?

No, it means that I live in terror of this decision being made and then suddenly being deluged with complains from customers that Z39.50 has stopped working when they upgrade to MG. I'm having to do work on a highly inconvenient schedule due to this.

Comment by Ann-Marie Breaux (Inactive) [ 22/Jun/22 ]

Hi Marc Johnson Folijet is still working on testing DI matching performance without indexes, but we'll live if the decision is to move forward. If this breaks Z39.50 in terms of Inventory Single Record Import though, there will be huge blowback.

This code is merged into snapshot currently, isn't it? If Inventory Single Record Import is still working in Snapshot, then it seems not to have broken it. I'm not sure what the other bits are that Mike Taylor is working on, and how they affect the functioning of particular apps.

And I changed the issue from FOLIO to MODINVSTOR, so hopefully you can add the correct fix version to it and keep track of it during the release.

cc: Kateryna Senchenko

Comment by Marc Johnson [ 22/Jun/22 ]

Ann-Marie Breaux

If this breaks Z39.50 in terms of Inventory Single Record Import though, there will be huge blowback.

As I understand it, these changes fundamentally break the Z39.50 implementation. However, I very naive in this area, Mike Taylor is the better person to answer this.

This code is merged into snapshot currently, isn't it?

Yes, it is.

I changed the issue from FOLIO to MODINVSTOR

Thank you.

hopefully you can add the correct fix version to it and keep track of it during the release.

I'm not going to do that until folks make the call one way or the other which release it is in.

Until then, I consider mod-inventory-storage release (and thus the whole release) blocked.

Comment by Ann-Marie Breaux (Inactive) [ 22/Jun/22 ]

Hi all,

I just tested all 3 scenarios for Inventory Single Record Import on Snapshot. See attached video Testing Inv Sing Rec Import.mp4

  • Scenario 1: No instance exists; create a new instance and SRS MARC Bib by importing: WORKS
  • Scenario 2: Instance exists, with source = MARC; update instance and SRS MARC Bib by matching on Instance UUID in 999 ff $i of the SRS MARC Bib record: importing: WORKS
  • Scenario 3: Instance exists, with source = FOLIO; update instance and create SRS MARC Bib by matching on Instance UUID in the Instance record: DOES NOT WORK I don't know if this is because there is a temporary glitch in Snapshot, or because it's related to the work that Mike Taylor is trying to sort out. I've tried it a couple times today, and so far it's not working. To me, that's a very significant regression, and I think Charlotte Whitt will agree.
    • If it's determined that the problem is related to this Jira, then I think we roll back these changes, unless it can be fixed and demonstrated to work properly.
    • If it's determined that the problem is NOT related to this Jira, then we need to file a separate bug and figure out what the problem is ASAP

cc: Kateryna Senchenko

Comment by Marc Johnson [ 22/Jun/22 ]

Mikhail Fokanov Oleksii Petrenko Was single record import part of the scope of pre-merge testing done on the Core Platform scratch environment or by the AQAs?

Comment by Jenn Colt [ 22/Jun/22 ]

Just pointing out that I believe the last few comments are conflating the z39.50 server (Mike Taylor's project) and the single record import z39.50 retrieval from OCLC. 

Comment by Ann-Marie Breaux (Inactive) [ 22/Jun/22 ]

Apologies Jenn Colt if I have conflated them, but then it sounds like we may need to troubleshoot scenario 3 of Inventory Single Record Import as a separate bug, if it's not been caused by this. Can someone from Core Platform or from Prokopovych confirm for me whether this Jira's merge could be responsible for that scenario not working, or if it's a different bug.

Comment by Jenn Colt [ 22/Jun/22 ]

Yeah, I can't tell if it is related to this, but either way work to fix the z39.50 server won't help with scenario 3.

Comment by Ann-Marie Breaux (Inactive) [ 22/Jun/22 ]

I've just had confirmation that these changes should not have any effect (in terms of Z39.50) on Inventory Single Record Import. So I'll doublecheck, and probably write up a separate bug for ISRI scenario 3.

Aside from that, I think the only index that might have some impact on Data Import would be the Item status. Folijet is doing some performance testing on that. MODDATAIMP-697 Closed

Comment by Marc Johnson [ 22/Jun/22 ]

Ann-Marie Breaux

Apologies Jenn Colt if I have conflated them

It was me that was conflating them.

Jenn Colt thank you for the clarification.

Comment by Marc Johnson [ 22/Jun/22 ]

Ann-Marie Breaux

Can someone from Core Platform or from Prokopovych confirm for me whether this Jira's merge could be responsible for that scenario not working, or if it's a different bug.

I don't know enough about either the single record import (which was built by Thor?) or these changes to determine this.

Comment by Mikhail Fokanov [ 22/Jun/22 ]

Teams, that are responsible for modules that uses CQL queries to mod-invenory-storage should watch the logs of mod-invenory-storage for warnings like: "Doing FT search without index for"( and then there is a field name).

Hongwei Ji, if it is an option for bugfest team, we could have some alerting rule for this warning.

CC: Ann-Marie Breaux, Julian Ladisch

Comment by Mike Taylor [ 22/Jun/22 ]

To reiterate (the relevant part of) what I said on today's call:

  • The Net-Z3950-FOLIO module is FOLIO's Z39.50 server. The single-record import facility makes use of a Z39.50 client, which shares no relevant code with the server.
  • The server is gravely affected by the removal of the mod-inventory-storage indexes, which is why I am working on changing how it works – see ZF-62 Closed , MODGQL-152 Closed and MODGQL-153 Closed if interested in the gritty details.
  • The Z39.50 client is not affected at all by the removal of the indexes ...
  • ... but that doesn't necessarily mean that single-record import is not affected. Everything depends on what the match criteria are. If they are defined to include author or title matching, then that probably uses indexes that are no longer in place, which will make the matching process very slow: slow enough to fail completely sometimes.
  • But if we are confident that the match algorithms currently in use don't do author or title matching, then this should be a non-issue.
Comment by Marc Johnson [ 23/Jun/22 ]

Mike Taylor

To reiterate (the relevant part of) what I said on today's call

What call are you referring to?

Comment by Mike Taylor [ 23/Jun/22 ]

The one that Oleksii Petrenko called yesterday, at 6pm UK time.

Comment by Marc Johnson [ 23/Jun/22 ]

Mike Taylor

The one that Oleksii Petrenko called yesterday, at 6pm UK time.

Thanks. Damn, I didn't have that in my diary, as the invite came whilst I was in the TC and so I forgot about it and so missed it :-/

Comment by Jakub Skoczen [ 04/Jul/22 ]

Mikhail Fokanov Oleksii Petrenko It looks like the code for this change has been merged to master but the issues status says IN PROGRESS.

Also, has this code been release for Morning Glory? If so, this should be indicated in the module release notes and this ticket should be marked for Morning Glory.

Can you please clarify the status of this change?

Comment by Marc Johnson [ 05/Jul/22 ]

Also, has this code been release for Morning Glory? If so, this should be indicated in the module release notes and this ticket should be marked for Morning Glory.

I can confirm it has been included in the release for for Morning Glory.

It looks like the code for this change has been merged to master but the issues status says IN PROGRESS.

It has been. And given it is included in the release, I will close this issue.

Comment by Jakub Skoczen [ 06/Jul/22 ]

Marc Johnson Thank you for updating the ticket. Unfortunately, there's no mention of this change in the release notes for v24: https://github.com/folio-org/mod-inventory-storage/releases/tag/v24.0.0. Do you know why?

Comment by Marc Johnson [ 06/Jul/22 ]

Jakub Skoczen

Unfortunately, there's no mention of this change in the release notes for v24: https://github.com/folio-org/mod-inventory-storage/releases/tag/v24.0.0. Do you know why?

Likely because they weren't added during the pull request.

Informally, I ask that developers making releases check the issues involved in the release and amend the news during the release process.

However this isn't a reliable activity and things can be missed, especially when folks don't understand the changes made by others.

Generated at Thu Feb 08 23:27:44 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.