[FOLIO-3630] "f_unaccent(text)" function not found in reference environments Created: 07/Nov/22  Updated: 28/Nov/22  Resolved: 28/Nov/22

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Bug Priority: P2
Reporter: Wayne Schneider Assignee: Julian Ladisch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: File Snapshot Inventory Ref Data.mp4     PNG File image-2022-11-16-13-14-17-495.png     File mod-inventory-storage-snapshot-2.log.gz     File mod-inventory-storage-snapshot.log.gz    
Issue links:
Blocks
blocks MODORDERS-790 Duplicate Holding and Item when order... Closed
is blocked by RMB-955 Set search_path when doing "SET ROLE"... Closed
is blocked by MODINVSTOR-996 RMB 35.0.4, Vert.x Vert.x 4.3.5, Jack... Closed
Sprint: CP: Sprint 152, CP: Sprint 154
Development Team: Core: Platform
RCA Group: TBD

 Description   

Occasionally we have seen an error from the UI in the reference environments which reports:

operation GET on resource XXX failed, saying: ERROR: function f_unaccent(text) does not exist (42883)

Superficially, it appears to be coming primarily from endpoints that pull data from mod-inventory-storage, though more investigation is needed.

This function should be created in the module schema for RMB modules when the module is initialized for the tenant (so it may be a build problem). Strangely, the problem doesn't always occur, and different builds using the same module versions may not exhibit the same symptoms.



 Comments   
Comment by Jakub Skoczen [ 07/Nov/22 ]

Julian Ladisch any ideas what's going on here?

Comment by Julian Ladisch [ 07/Nov/22 ]

Wayne Schneider  Please provide more details, at least the mod-inventory-storage version and the actual request (don't XXX in "on resource XXX").

Does the problem persist for a given installation and a complete redeploy is needed?

Comment by Wayne Schneider [ 07/Nov/22 ]

Thanks, Julian Ladisch, we will provide more details as we investigate. My apologies, the initial version of this ticket was meant to be just a placeholder, we will let you know when we have more information and (hopefully) a way to reproduce.

Comment by Zak Burke [ 09/Nov/22 ]

From the UI side, I get a status of 400 with a response body of

ERROR: function f_unaccent(text) does not exist (42883)

from the following requests:

/alternative-title-types?limit=1000&query=cql.allRecords=1%20sortby%20name
/call-number-types?limit=1000&query=cql.allRecords=1%20sortby%20name
/classification-types?limit=1000&query=cql.allRecords=1%20sortby%20name
/contributor-name-types?limit=1000&query=cql.allRecords=1%20sortby%20ordering
/contributor-types?limit=400&query=cql.allRecords=1%20sortby%20name
/electronic-access-relationships?limit=1000&query=cql.allRecords=1%20sortby%20name
/holdings-note-types?limit=1000&query=cql.allRecords=1%20sortby%20name
/holdings-sources?limit=1000&query=cql.allRecords%3D1%20sortby%20name
/holdings-types?limit=1000&query=cql.allRecords=1%20sortby%20name
/identifier-types?limit=1000&query=cql.allRecords=1%20sortby%20name
/ill-policies?limit=1000&query=cql.allRecords=1%20sortby%20name
/instance-formats?limit=1000&query=cql.allRecords=1%20sortby%20name
/instance-note-types?limit=1000&query=cql.allRecords%3D1%20sortby%20name
/instance-relationship-types?limit=1000&query=cql.allRecords=1%20sortby%20name
/instance-statuses?limit=1000&query=cql.allRecords=1%20sortby%20name
/instance-types?limit=1000&query=cql.allRecords=1%20sortby%20name
/item-damaged-statuses?limit=1000&query=cql.allRecords=1%20sortby%20name
/item-note-types?limit=1000&query=cql.allRecords%3D1%20sortby%20name
/loan-types?limit=1000&query=cql.allRecords%3D1%20sortby%20name
/locations?limit=2000&query=cql.allRecords%3D1%20sortby%20name
/material-types?limit=1000&query=cql.allRecords%3D1%20sortby%20name
/modes-of-issuance?limit=1000&query=cql.allRecords=1%20sortby%20name
/nature-of-content-terms?limit=1000&query=cql.allRecords=1%20sortby%20name
/statistical-code-types?limit=1000&query=cql.allRecords=1%20sortby%20name
/statistical-codes?limit=2000&query=cql.allRecords=1%20sortby%20name

 

Comment by Julian Ladisch [ 09/Nov/22 ]

These are all mod-inventory-storage APIs.

The database function f_unaccent(text) is created when mod-inventgory-storage is enabled for the tenant:
https://github.com/folio-org/raml-module-builder/blob/v35.0.0/domain-models-runtime/src/main/resources/templates/db_scripts/general_functions.ftl#L89-L129

Was there an error message in the log when the tenant was enabled?

Comment by Marc Johnson [ 09/Nov/22 ]

Julian Ladisch

at least the mod-inventory-storage version and the actual request (don't XXX in "on resource XXX").

It has occurred with multiple mainline snapshot versions of mod-inventory-storage since at least the 24th October.

As far I can tell, it is still occurring with the latest release from the head of mainline (based upon Ann-Marie's most recent report)

Does the problem persist for a given installation and a complete redeploy is needed?

We don't know. We only have anecdotal evidence of it happening.

Comment by Ann-Marie Breaux (Inactive) [ 09/Nov/22 ]

I've attached the video that I put in Slack earlier today. My experience was that it happened intermittently 2 or 3 times when I was using Snapshot between 12 am-4 am US ET (so not when Snapshot was being updated). I didn't have the settings at first and couldn't get to Inventory.I wrote the Slack message, but didn't post it.  I switched over to Snapshot-2 and used it for a while. 

Then I accidentally got back onto Snapshot, and was able to see the Settings and Inventory. A couple hours later, they were messed up again. I cleared my cache and tried again, but still messed up. That's when I made the video and posted on Slack. 

Comment by Wayne Schneider [ 11/Nov/22 ]

Today (10 November) the logs for mod-inventory-storage for both folio-snapshot and folio-snapshot-2 (attached to this issue) show evidence of this error condition. I am likely not the best person to analyze these logs for smoking guns, but I do observe that in both cases the errors appear to start occurring when test tenants are being created and destroyed, and do not persist for long after that activity is done, suggesting some possible issues with connection pooling. mod-inventory-storage in these systems is configured with the default maxPoolSize (4, according to the documentation for RMB).

Comment by Julian Ladisch [ 11/Nov/22 ]

Thanks for the logs.

Time when DELETE /_/tenant/{id} is called:
09:23
09:26
09:28
12:46
12:50
12:57
13:06
14:18
15:35

Start time of time ranges with "f_unaccent(text) does not exist":
09:23
12:46
14:19
15:36

I'm investigating why the DELETE causes these errors.

Comment by Marc Johnson [ 15/Nov/22 ]

in both cases the errors appear to start occurring when test tenants are being created and destroyed

What tests are using the current snapshot environment for testing?

Comment by Zak Burke [ 15/Nov/22 ]

On the one hand, that's a good question, Marc Johnson, since it may help us understand what's causing this problem. But I also wonder if it's an irrelevant question because I would expect a module to continue to function normally for Tenant A regardless of whether Tenant B is being created/destroyed. 

Comment by Marc Johnson [ 15/Nov/22 ]

Zak Burke

On the one hand, that's a good question, Marc Johnson, since it may help us understand what's causing this problem. But I also wonder if it's an irrelevant question because I would expect a module to continue to function normally for Tenant A regardless of whether Tenant B is being created/destroyed.

I agree that it's a tertiary question. And I'm glad that it's use of a shared environment has potentially revealed this issue.

I am an advocate of running a single pre-production environment with multiple tenants over multiple environments in order to reveal multi-tenancy related challenges.

That said, I thought it had been decided that the various suites of automated tests would run against dedicated environments. And so, I figured it might be important to find out which ones don't (we know that the Karate tests aren't entirely isolated).

Comment by Zak Burke [ 15/Nov/22 ]

All good points, Marc Johnson; I think we're on the same page.

Comment by Julian Ladisch [ 15/Nov/22 ]

Developers working on integration test might run them locally against snapshot or snapshot-2 as suggested in the README.md:

https://github.com/folio-org/folio-integration-tests#running-integration-tests

Comment by Erin Nettifee [ 16/Nov/22 ]

I am seeing this issue right now on snapshot, and I am also seeing that there is no reference data loading in Settings > Inventory. Just noting in case the timing/evidence is helpful in troubleshooting.

 

Comment by Erin Nettifee [ 16/Nov/22 ]

Now checking again at 1:33 and snapshot looks normal again.

Comment by Ann-Marie Breaux (Inactive) [ 16/Nov/22 ]

As Yul Brynner/the King of Siam would sorta sing, "It is a puzzlement"

Comment by Erin Nettifee [ 16/Nov/22 ]

And now at 2:30 I'm getting consistent timeouts on GET /loan-storage/loans/..... and I'm wondering if whatever this is broke the loan record that I had created for test purposes...

Comment by Julian Ladisch [ 24/Nov/22 ]

The root cause of the issue is the PostgreSQL search_path that was set to the schema of the deleted tenant.
Most likely a developer locally ran folio-integration-tests and ran it against the snapshot environment as suggested in the readme: https://github.com/folio-org/folio-integration-tests#running-integration-tests

The search_path is used to call FOLIO database functions like f_unaccent. Usually all tenants that are served by the same module version have exactly the same function in their schema so calling the same function in a wrong schema still gives the same result.

This bug only happens if DB_MAXSHAREDPOOLSIZE is enabled.

This bug only happens if a database connection that had been used by the deleted tenant is re-used by a different tenant before the connection is closed. The connection is closed if idle for DB_CONNECTIONRELEASEDELAY that defaults to 60 seconds.

Workaround: Restart the container.

Solution: RMB-955 Closed has the fix and is shipped with RMB 35.0.4 that has been released today.

Comment by Julian Ladisch [ 25/Nov/22 ]

The fix has been merged to mod-inventory-storage master and will be deployed to https://folio-snapshot.dev.folio.org in 10 hours and to https://folio-snapshot-2.dev.folio.org in 22 hours.

Comment by Erin Nettifee [ 28/Nov/22 ]

I'm impressed Julian Ladisch - thank you for doing the work to track this down!

Comment by Julian Ladisch [ 28/Nov/22 ]

MODINVSTOR-996 Closed has fixed it, neither snapshot nor snapshot-2 have this issue any longer.

Generated at Thu Feb 08 23:29:32 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.