[FOLIO-1955] Create databases using und-x-icu collation Created: 09/Apr/19 Updated: 24/Sep/23 |
|
| Status: | Open |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Task | Priority: | P2 |
| Reporter: | Julian Ladisch | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | platform-backlog | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Issue links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Story Points: | 2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Development Team: | Core: Platform | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
These lines need to be added to the postgresql_db database creation task to enable und-x-icu collation (sorting): - postgresql_db:
encoding: UTF-8
lc_collate: und-x-icu
lc_ctype: und-x-icu
template: template0
ICU collation is operating system independent. und-x-icu (for “undefined”) is the ICU “root” collation. Use this to get a reasonable language-agnostic sort order: https://www.postgresql.org/docs/current/static/collation.html This fixes the problem that spaces are ignored when sorting. See
|
| Comments |
| Comment by John Malconian [ 10/Apr/19 ] |
|
Updating the ansible task using the example above results in the following error: fatal: [10.36.1.200]: FAILED! => {"changed": false, "msg": "Database query failed: invalid locale name: \"und-x-icu\"\n"} |
| Comment by John Malconian [ 11/Apr/19 ] |
|
Julian Ladisch - Do you have any ideas about what might be missing? Package: postgresql-10 |
| Comment by John Malconian [ 11/Apr/19 ] |
|
From the postgresql server log: 2019-04-10 17:48:54.693 UTC [2714] folio_admin@postgres ERROR: invalid locale name: "und-x-icu" |
| Comment by Marc Johnson [ 11/Apr/19 ] |
|
John Malconian Could be a red-herring, this stack overflow answer suggests the locale cannot be set on a database this way |
| Comment by John Malconian [ 11/Apr/19 ] |
|
I think you're right, Marc Johnson. Might be a limitation of pg10. I've seen a couple of stack overflow answers allude to this. |
| Comment by John Malconian [ 11/Apr/19 ] |
|
tried a couple of variations based on Stack Overflow and still no luck. CREATE DATABASE "okapi_modules" WITH OWNER = "folio_admin" TEMPLATE = "template0" ENCODING = 'UTF-8' LC_COLLATE = 'und-x-icu' LC_CTYPE = 'und-x-icu' TABLESPACE = pg_default CONNECTION LIMIT = -1; ERROR: invalid locale name: "und-x-icu" Should I remove the LC_COLLATE and LC_TYPE config and include 'ENCODING = 'UTF-8' or should I mark this issue as 'won't fix'. |
| Comment by Julian Ladisch [ 23/Apr/19 ] |
|
I've tried und-x-icu with postgres:10-alpine docker container where that CREATE DATABASE works. That fixes the sorting problem, but this restriction how postgres must be run (Alpine) can be a temporary solution only. A fix for PostgreSQL to make ICU work in Debian/Ubuntu has been proposed but needs some fundamental design work: |
| Comment by Ann-Marie Breaux (Inactive) [ 09/Aug/19 ] |
|
Hi Julian Ladisch and John Malconian and Marc Johnson Looks like this story is blocking several others, and the blocker for this one has been resolved. Might you be able to work this into an upcoming Core-platform sprint? Thank you for considering. |
| Comment by Julian Ladisch [ 12/Aug/19 ] |
|
This requires to replace the Debian/Ubuntu APT PostgreSQL package by the Docker Alpine PostgreSQL package: https://github.com/folio-org/folio-ansible/blob/master/roles/postgresql/tasks/main.yml |
| Comment by Wayne Schneider [ 12/Aug/19 ] |
|
Julian Ladisch are you proposing running PostgreSQL in a container, using the postgres:10-alpine image? |
| Comment by Wayne Schneider [ 12/Aug/19 ] |
|
John Malconian Ian Hardy have we tested this in RDS? |
| Comment by John Malconian [ 12/Aug/19 ] |
|
I don't understand why we would even entertain the idea of adopting something that is only supported in a small subset of postgresql distributions. |
| Comment by Julian Ladisch [ 13/Aug/19 ] |
|
Yes, I propose to run PostgreSQL in a Docker container, using the postgres:10-alpine image. PostgreSQL sorts using collations provided by the operating system. The collation that Ubuntu/Debian provides ignores spaces and therefore has been reported as a bug, see the linked issues
I also don't like to restrict the operating system PostgreSQL runs under but we should try this as a temporary solution until a fix for PostgreSQL to make ICU work in Debian/Ubuntu is available: https://www.postgresql.org/message-id/flat/0951afb4-a634-e047-4cd8-a9e98f7feed9%402ndquadrant.com#d1bd94338d12d55441690651d2d27b4e Side note:
|
| Comment by Marc Johnson [ 13/Aug/19 ] |
|
My understanding is that there is a need to be able to search / sort using different collation for individual properties, e.g. the English title vs. the Swedish title of an instance, or where some instances have German titles and others have Dutch titles? Given that, does this approach satisfy those needs, or would it only offer searching / sorting by a single collation across all records? |
| Comment by Charlotte Whitt [ 14/Aug/19 ] |
|
Hi Marc Johnson - correct we need to be able to handle e.g. Swedish characters with diacritics, German characters with diacritics, Hungarian letters with diacritics etc, etc, special Danish and Norwegian letters like : æøå, ÆØÅ, and many more. Theodor Tolstoy (One-Group.se) added for a long time ago this bug:
|
| Comment by Marc Johnson [ 14/Aug/19 ] |
|
@charlotte stated
I think if we need to support records for the same tenant that can contain a mixture of multiple collations, then whilst PostgreSQL ICU support might be useful or necessary, it is unlikely to be sufficient to fulfil all of those needs. |
| Comment by Julian Ladisch [ 14/Aug/19 ] |
|
This issue is to fix the space problem only. und-x-icu (for “undefined”) is the ICU “root” collation to get a reasonable language-agnostic sort order: https://www.postgresql.org/docs/current/static/collation.html For the other topic see
|