[FOLIO-2573] Paused due to POC of Elastic Search. All Search Options Should be Supported by a Database Index Created: 24/Apr/20 Updated: 31/Jan/22 Resolved: 31/Jan/22 |
|
| Status: | Closed |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Task | Priority: | P3 |
| Reporter: | Marc Johnson | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | PTF-Review, elastic-search, potential-decision | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Issue links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Core: F - Sprint 87, Core: F - Sprint 88 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Development Team: | Prokopovych | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
In order to (hopefully) pre-empt inadequate performance when searching inventory using the defined options in the reference UI, database indexes should be created for each CQL criteria that is generated by the reference UI Considerations
|
| Comments |
| Comment by Marc Johnson [ 24/Apr/20 ] |
|
Cate Boerema Charlotte Whitt Jakub Skoczen Zak Burke Michal Kuklis There have been a number of ad-hoc conversations about this, I want to confirm that we are all in agreement that this is what we are going to do going forward? Please raise any concerns you might have |
| Comment by Charlotte Whitt [ 24/Apr/20 ] |
|
Hi Marc Johnson - this looks really good CC: lew235 |
| Comment by Bohdan Suprun (Inactive) [ 24/Apr/20 ] |
|
Hi Marc Johnson, I think it is important to make an agreement what CQL query is going to be used for a particular story, because required indexes may depend on operators that are used in the CQL. |
| Comment by Marc Johnson [ 24/Apr/20 ] |
I totally agree, we are going to need to think about how we sequence the work to do that |
| Comment by Cate Boerema (Inactive) [ 27/Apr/20 ] |
These considerations concern me. Rather than making this a blanket policy, is there any way to only add indexes where they would improve search performance and not negatively impact write performance? I wonder if the folks on the performance testing task force have thoughts on this. Tagging Martin Tran for his thoughts. |
| Comment by Marc Johnson [ 27/Apr/20 ] |
As I understand it, the addition of an index is likely to have a negative impact on write performance, as the index has to be updated when the write happens (in order for them to be consistent). To what degree that impact is, I don't know, it would require further investigation. It might be that after a while, the additional impact of each new index becomes close to zero. It's difficult to know for certain whether an index will increase performance for all searches. It is likely that each index will help significantly with a search that only contains the term it is designed to help. Once the search includes more than one term, it might even reduce the performance of a query because the database needs to decide which index to use. It might not always make the right decision. I believe this could be especially the case with the JSONB structures that FOLIO uses, as there aren't detailed statistics to help it make that decision, unlike regular columns. I think we would likely need performance tests for a wide variety of searches (and create / update scenarios) in order to learn more about the impact of these indexes. Does that make sense? Jakub Skoczen Mikhail Fokanov Julian Ladisch Adam Dickmeiss Please correct my understanding of any of the above. |
| Comment by Julian Ladisch [ 06/May/20 ] |
|
Indexes reduce write performance, but this is very small and users cannot notice this write performance difference in the front-end. |
| Comment by Marc Johnson [ 06/May/20 ] |
|
Julian Ladisch Thank you for your thoughts.
Do we know what impact each index has?
I'm thinking as much about upgrades that change many records. As I understand it, any change to a JSONB column involves a re-evaluation of all indexes based upon parts of that column. Is my understanding incorrect?
Does that mean you are advocating for indexes that support every search in the reference inventory UI? Would you go further, and index all properties in inventory, as they can be used in the CQL search in the UI? |
| Comment by Julian Ladisch [ 06/May/20 ] |
It depends on average size of the field to be indexed and the number of records of the table.
That is correct. Some initial PostgreSQL code for a recheck_on_update optimization exists, but it is not production ready. It calculates the index expression before and after the update and skips index update when they are the same. I advocate for creating indexes that give a noticeable speedup for all searches where we know that they will likely be executed. There will always be slow searches, for example when joining tables (instance, holding, item) and using indexed fields from instance and item. |
| Comment by Marc Johnson [ 08/May/20 ] |
Thanks, I wondered if we had a sense of the impact for a table with indexes from properties in a JSONB column with approximately 10 million records. I'm not asking you to investigate this, I was only curious.
How would you suggest we discover these indexes? Do you think it is reasonable to do this for all of the explicit search options in the reference UI (compound searches might still be slow)? A naive approximation, based upon 1 index per search option would mean the current reference UI would need a minimum of 15 indexes (some of which are custom like keyword). More will be coming soon, including 4-9 (I think) for expansions to call number number searching. |
| Comment by Julian Ladisch [ 08/May/20 ] |
|
Yes, it is reasonable to index all explicit search options in the UI. We should ask implementers to report slow queries. We may even create a slow queries page on wiki.folio.org where people can easily report them. |
| Comment by Charlotte Whitt [ 08/May/20 ] |
|
Hi Marc Johnson - it's pretty common to maintain search indexes for large bibliographic data bases with - e.g. I found this 57 p. document with spec for search codes and indexies in in DanBib. A Danish National Union Catalogue with all records formated as danMARC2 (a Danish variant of MARC21) - https://slks.dk/fileadmin/user_upload/0_SLKS/Dokumenter/Biblioteker/Standarder/Praksisregler_for_soegeveje.pdf. Have we an idea about, how we plan to list/document a list of implemented indexed searches? |
| Comment by Marc Johnson [ 08/May/20 ] |
Does that mean that that you are an advocate of creating (at least one, unless already present) index for each search option we add to the reference UI?
I think that is a good idea. I don't know if the FOLIO community has a process for accepting feedback from the implementers. |
| Comment by Marc Johnson [ 08/May/20 ] |
Ok. Does that mean you are advocating for FOLIO to create database indexes for each search option we add to the reference UI, based upon the fact that other systems do this? Unfortunately, that document does not help me understand the needs or the technical choices that organisation made because it is in a language I cannot read.
Not that I am aware of, to begin to answer that I'd need to understand the need. What need / expectation would that documentation serve? |
| Comment by Charlotte Whitt [ 08/May/20 ] |
|
Sorry Marc Johnson - I didn't intend that you should read that report in Danish. Just get the sense of how other systems do documentation of the search codes and indexes. Yes, I'm advocating for that we in FOLIO do indexes for each of the search options and defined filters in the UI in Search and filter (first pane). And when adding a new search option or a filter, then we automatically should create the equivalent search index. These search indexes should be documented, e.g. as a table. This documentation could be transformed to a tips & trick document, for the staff user to know, what search indexes to use, when wanting to perform a given search. The MM-SIG has started following wiki page - https://folio-org.atlassian.net/wiki/display/FOLIOtips/Searching |
| Comment by Marc Johnson [ 08/May/20 ] |
Thanks. It is challenging for me to get a sense of how other systems do this when I am unable to read the documentation itself, sorry :-/
Ok. Alas, it isn't automatic and requires diligence and development work each time.
This might sound that an impertinent question. Why do they need to be documented? What will this documentation be used for?
Does that mean this would be user facing documentation for them constructing their own queries outside of the ones provided by the reference UI? If so, I don't know what the process is for the creation of user facing documentation for FOLIO. |
| Comment by Charlotte Whitt [ 12/May/20 ] |
|
Hi Theodor Tolstoy (One-Group.se) and Lisa Sjögren - I have added you as watcher to this ticket, while I think the conversation in the #Support channel are addressing from a user perspective some of the challenges we're dealing with re. searches. Link to the original slack chat, thread 4/16/2020: in the #Support channel - https://folio-project.slack.com/archives/C2BHMHCFJ/p1587026417102300 |
| Comment by Charlotte Whitt [ 12/May/20 ] |
|
Comments/findings from Wayne Schneider [5/12/2020, 10:45 PM]
|
| Comment by Theodor Tolstoy (One-Group.se) [ 15/May/20 ] |
|
Charlotte Whitt Thank you for pinging! . |
| Comment by Theodor Tolstoy (One-Group.se) [ 15/May/20 ] |
I cannot help to think about how this list would have looked, had we had a proper search engine in place. |
| Comment by Lisa Sjögren [ 15/May/20 ] |
|
Thanks Charlotte Whitt! I am following these discussions around optimizing search performance (speed, usability, precision, internationalization), using either further indexing of fields or by adding a search engine, with great interest. And thank you Theodor Tolstoy (One-Group.se) for bringing up the search engine angle. I think it's important that the discussions around different approaches – while they might both have their place, short and long term – converge and inform each other. |
| Comment by Charlotte Whitt [ 15/May/20 ] |
|
Just a FYI: Link to a proposal (work in progress) published on the FOLIO wiki - https://folio-org.atlassian.net/wiki/pages/viewpage.action?pageId=1776170 |
| Comment by Marc Johnson [ 22/Jun/20 ] |
|
I think that the presence of issues reported (mostly by the EBSCO FSE or Performance Task Force teams) for searches that do not use a database index (e.g.
Given that, I've expanded the scope of this decision to be broader than inventory, and will add it to the topics for the Technical Leads to discuss. I think we need need to decide:
What do folks think? Do we need more information (a proposal) before this can be discussed further? |
| Comment by Julian Ladisch [ 23/Jun/20 ] |
|
Small tables don't need database indexes because all queries are always fast, even when using a full table scan. |
| Comment by Charlotte Whitt [ 25/Jun/20 ] |
|
Hi Julian Ladisch - could you give an example on, what you define as small tables? |
| Comment by Julian Ladisch [ 25/Jun/20 ] |
|
10000 records. Most of these tables probably are small: |
| Comment by Marc Johnson [ 03/Jul/20 ] |
|
Jakub Skoczen Craig McNally Julian Ladisch Cate Boerema Charlotte Whitt Martin Tran Svitlana Zmiivska Jeremy Huff Bohdan Suprun Zak Burke Here is my current understanding of the situation
The top two of these effectively make a database index for many properties required. If we are to accommodate the fourth option then we either cannot allow the third to go ahead or all possible queries have to be catered for by an index. This would, in effect mean multiple indexes on every property in instances, holdings and items. Even if the impact of each index is relatively small, the cumulative effect of either of these could be significant. Have I understood the situation correctly? Is this the intended outcome of this combination of decisions? Should there be exceptions? As Julian Ladisch correctly states, some record collections are small and so may not require indexes be sufficiently performant (I'm deliberately not defining what that is in this post) Does a topic need raising with the Technical Leads or the Technical Council to turn this emerging defacto policy into actual policy? |
| Comment by Marc Johnson [ 03/Feb/21 ] |
|
Blocking as is dependent upon the decisions made following the elastic search based search PoC |
| Comment by Charlotte Whitt [ 31/Jan/22 ] |
|
Marc Johnson - can I close this ticket as 'Won't do', or do you want to keep it as an 'umbrealla' ticket and keeping track of the mod-search work? |
| Comment by Marc Johnson [ 31/Jan/22 ] |
I have no interest in this being kept as an umbrella for ES based search work. |
| Comment by Charlotte Whitt [ 31/Jan/22 ] |
|
This work is now tracked in MSEARCH |