Codex
(UXPROD-833)
|
|
| Status: | Closed |
| Project: | ui-search |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None | Parent: | Codex |
| Type: | Bug | Priority: | P4 |
| Reporter: | Theodor Tolstoy (One-Group.se) | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | bug-search, chalmers, front-end, keep-bug, triaged, ui-only | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | 1 hour | ||
| Original estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||
| Issue links: |
|
||||||||||||||||||||||||||||||||
| Sprint: | malconia Sprint 1 | ||||||||||||||||||||||||||||||||
| Development Team: | Prokopovych | ||||||||||||||||||||||||||||||||
| Epic Link: | Codex | ||||||||||||||||||||||||||||||||
| Description |
|
Overview: When conducting title level searches in Codex for titles containing Swedish diacritics (å,ä,ö) the search behaves as if those characters are reduced to their ASCII equivalents (a,o). Steps to Reproduce:
Expected Results: (Another form of expected result is that also "Den åländska skärgården" is showing since "åländska" is a form of "åland" that Swedish stemming algorithms might be able to catch.) Actual Results: Additional Information: Will add these in separate issues. This particular issue might get solved by changing Collation on relevant tables in Postgres to Swedish (see https://www.postgresql.org/docs/9.1/static/collation.html), but I believe that this issue is related to a bigger discussions on search technology |
| Comments |
| Comment by Cate Boerema (Inactive) [ 21/Feb/18 ] |
|
Tagging Charlotte Whitt for awareness. |
| Comment by Mike Taylor [ 21/Feb/18 ] |
|
Two things. First, the behaviour you describe here – treating letters with and without diacritics as equivalent for the purpose of searching – is near-universally considered desirable: many users will type aland when they want to find the Åland archipelago. Of course it's possible that librarians are different, and really do want to make a distinction – if the SIGs have come to that conclusion, then fine. Second, this is nothing to do with ui-search. All it does it submit the user's query to the back-end module mod-codex-mux, which in turn passes it on to all the back-end modules that provide Codex sources: mod-codex-inventory, mod-codex-ekb, and others in future. In all cases, the records that get displayed to the user are those that the individual Codex-source modules considered correct. So depending on whether you're getting local-inventory records or EBSCO KB records with (And I have no idea whether mod-codex-ekb has the flexibility to control this kind of detail in how search works, but based on its inability to deal with many other aspects of searching, my guess would be not. But we can ask.) So the way forward is: 1. Determine whether, for searching, we really want to distinguish accented characters and their unaccented equivalents; and Then we'll be able to go ahead and file issues on the relevant back-end module or modules. |
| Comment by Cate Boerema (Inactive) [ 21/Feb/18 ] |
|
Thanks Mike Taylor. I actually moved this to UISE. In retrospect, I could have left it in FOLIO until Charlotte Whitt returned. Anyway, you raise some good questions. Charlotte can weigh in on whether we want to do this or not when she's back from vacation. I'll assign this to her and mark it DRAFT. |
| Comment by Mike Taylor [ 21/Feb/18 ] |
|
No problem – I'm glad you did move it into UISE, otherwise I would probably never have seen it! |
| Comment by Theodor Tolstoy (One-Group.se) [ 21/Feb/18 ] |
|
Regarding your first point Mike Taylor, I might have expressed myself in a way that leads to misinterpretation. On a side note, Swedes would not care for diacritics in other languages (like the french é's and so on), so this is something i18n efforts must take into consideration going forward. |
| Comment by Mike Taylor [ 21/Feb/18 ] |
Thanks for this clarification. I fear it makes the problem even more intractable, then: we will need to behave differently for Swedish (where "å" is a different letter from "a") and, say, French (where "é" is a modified form of "e"). So we will run into a similar set of locale-related issues to those discussed in
Haha, I wish I shared your confidence. If we controlled the whole stack, I would agree with you: for example, it should not be too difficult to make the Inventory UI module work correctly along these lines. The problem is that the Codex is by design compounded from components contributed by multiple vendors, on multiple technical substrates. Please calibrate your optimism accordingly |
| Comment by Theodor Tolstoy (One-Group.se) [ 21/Feb/18 ] |
|
Mike Taylor, my comments may make my limited knowledge in the inner workings of Codex show. I would welcome some investigation into what the actual needs are. |
| Comment by Mike Taylor [ 21/Feb/18 ] |
|
I'd say you've done a really helpful of clarifying the needs – especially the explanation that Swedish "å" is its own letter in a way that French "é" is not. I don't think we'd have been able to arrive at such a good understanding of what's required without that. Anyway, let's see what Adam has to say on the sorting issue (
BTW., sorry if I've come across as patronising at any stage of this – I know I can fall into that; it's not my intention. |
| Comment by Theodor Tolstoy (One-Group.se) [ 21/Feb/18 ] |
|
Thank you. No problem, I've enjoyed this discussion |
| Comment by Jakub Skoczen [ 21/Aug/18 ] |
|
Theodor Tolstoy (One-Group.se) on
|
| Comment by Mike Taylor [ 21/Aug/18 ] |
|
(Side-issue: I worry that a lot of issues are cropping up here in the UISE project that really pertain to much lower level or more general aspects of FOLIO – such as the feature Jakub just mentioned where exact matching including accents would boost a hit's relevance score. In general, when tempted to file and issue in UISE, would POs please check first whether the same issue pertains in the Inventory app? If so, then better to file it in UIIN, so we can fix it more simply without having to think about so many different layers of software at once.) |
| Comment by Holly Mistlebauer [ 21/Dec/21 ] |
|
This ticket has been closed because it is over 3 years old and has a very low priority. |
| Comment by Theodor Tolstoy (One-Group.se) [ 14/Jan/22 ] |
|
Magda Zacharska Same here. How is this handled within the ES implementation? Do you want me to file a ticket? |
| Comment by Magda Zacharska [ 14/Jan/22 ] |
|
Theodor Tolstoy (One-Group.se) Kiwi bugfest environment has in addition to the default English analyzer also installed Russian, Hebrew and Arabic analyzers but it does not have Swedish one. You might want to create a request for devops to add this analyzer and to rebuild the index after that - so you can verify a Swedish language specific behavior. |