[UISE-70] Codex search results are taking Nonfiling characters into account when sorting Created: 20/Feb/18 Updated: 21/Dec/21 Resolved: 21/Dec/21 |
|
| Status: | Closed |
| Project: | ui-search |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Bug | Priority: | P4 |
| Reporter: | Theodor Tolstoy (One-Group.se) | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | chalmers, front-end, keep-bug, triaged, ui-only | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | 1 hour | ||
| Original estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue links: |
|
||||||||||||||||||||||||
| Sprint: | |||||||||||||||||||||||||
| Development Team: | Prokopovych | ||||||||||||||||||||||||
| Description |
|
Overview: When conducting title level searches in Codex, The sort algorithm does seem to take definite article and other Nonfiling characters into consideration. This seems to be true for both Swedish and English. Steps to Reproduce:
Expected Results:
Note: Not all of these results (the result items themselves) are not expected to emerge. Disregard from that, the point is that the nonfiling charachters has been taken into account in the sort. Actual Results: |
| Comments |
| Comment by Cate Boerema (Inactive) [ 21/Feb/18 ] |
|
Tagging Charlotte Whitt for awareness. |
| Comment by Mike Taylor [ 21/Feb/18 ] |
|
I can't come up with a rationale for why we might expect the "Expected" sort order. Surely "Northern Territories, Asia-Pacific Regional Conflicts and the" should not all be discarded so that the record sorts by "Åland"? What am I missing? |
| Comment by Theodor Tolstoy (One-Group.se) [ 21/Feb/18 ] |
|
Mike TaylorI added an explanation to the results in the expected example I am not sure that this is the way we ant FOLIO to deal with Nonfiling characters, but I am pretty sure we must have a discussion on it since it emerged out of the initial impressions i received when visiting Chalmers, and since it is a thing in current systems. |
| Comment by Theodor Tolstoy (One-Group.se) [ 21/Feb/18 ] |
|
I added a screenshot from inside Sierra on a list of search results sorted alphabetically showcasing how it does not take the "den" definite article into account. |
| Comment by Mike Taylor [ 21/Feb/18 ] |
|
Thanks for that explanation – of course, it makes perfect sense. So just to clarify: the term 'nonfiling characters" actually refers to words, such as "the" and "den", rather than to characters? I guess this is just one of those things where the standard term for the concept is wrong, but we're stuck with it. Here's another multilingual problem. We can't just strip "den" from the start of titles for sorting purposes, because then English titles like "Den of Thieves" will be sorted wrongly. So what is the desired functionality? (Once we figure that out, we can start to think about whether it can actually be implemented.) |
| Comment by Theodor Tolstoy (One-Group.se) [ 21/Feb/18 ] |
|
For MARC21, this is being handled.(as far as I know). Someone with more up-to-date knowledge should have a look into this. |
| Comment by Mike Taylor [ 21/Feb/18 ] |
|
Yes, that's a good approach – the 2nd indicator on the 345 field explicitly states how many leading characters to skip. But the Codex sources will not in general have that information. |
| Comment by Theodor Tolstoy (One-Group.se) [ 21/Feb/18 ] |
|
That is true, but Ii think there are more automatic approaches that could be used today that are more efficient. I think for example Solr and Elasticsearch could be taught to handle this. |
| Comment by Mike Taylor [ 21/Feb/18 ] |
|
I hope you're right – but (A) mod-codex-ekb is not using either of these; (B) neither is mod-codex-inventory, it's using the RMB-mediated access to PostgreSQL; and (C) in any case, this can't be done correctly without knowing the language of each record – otherwise we get the "Den of Thieves" problem I mentioned above. |
| Comment by Theodor Tolstoy (One-Group.se) [ 21/Feb/18 ] |
|
That is true, but i thing you can come a long way using automated approaches. Maybe this is not the best place to ask this question, but why is there not a Search engine in Codex? |
| Comment by Mike Taylor [ 21/Feb/18 ] |
|
There are basically two approaches to searching multiple sources at once. There are advantages and disadvantages to each approach. #1 needs more up-front effort and more sysadmin, but yields faster and more consistent results. This is what Summon does. #2 is more lightweight, but slower and dependent on the capability of the sources. The Codex is a type-2 solution. We would perhaps like to do a type-1 solution, but the fundamental problem is that we can't in general harvest all the things we want. For example, the EBSCO KB is proprietary and not available for harvesting. So for now at least, this is a non-starter. |
| Comment by Jakub Skoczen [ 21/Aug/18 ] |
|
Theodor Tolstoy (One-Group.se) Mike Taylor guys, I'd like to make sure we are clear about the scope of what can (and will) be done vs what is outside of Core Team conrol. I suggest particular issue in two stages: 1. Stage 1: address sort and search issues in Inventory (and other modules that index data locally in FOLIO), relevant issues here are
2. Stage 2: address sort and search issues in Codex Search app, here we are generally limited by the quality of results from the upstream sources, one of which we control directly (Inventory) whlle for the other (EBSCO KB) we can request certain tuning. |
| Comment by Mike Taylor [ 21/Aug/18 ] |
|
Strongly agree. These conversations got into a lot of unnecessary complexity by trying to solve the difficult case of the Codex before having solved the (relatively!) easy case of the local inventory. |
| Comment by Holly Mistlebauer [ 21/Dec/21 ] |
|
This ticket has been closed because it is over 3 years old and has a very low priority. |