[UXPROD-2623] Wait for POC of Elastic Search - Round estimated search result hit count (totalRecords) Created: 17/Jun/20 Updated: 11/Aug/21 Resolved: 11/Aug/21 |
|
| Status: | Closed |
| Project: | UX Product |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | New Feature | Priority: | TBD |
| Reporter: | Julian Ladisch | Assignee: | Jakub Skoczen |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | NFR, elastic-search, q3-2020-spillover, result-count, search, searching | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Development Team: | Core: Platform | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| PO Ranking Note: | CW: This feature is in an undecided state, due to awaiting the outcome of the POC of using Elastic Search in Inventory. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank: Chicago (MVP Sum 2020): | R2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank: Cornell (Full Sum 2021): | R1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank: Duke (Full Sum 2021): | R1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank: TAMU (MVP Jan 2021): | R2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
CW: This feature is in an undecided state, due to awaiting the outcome of the POC of using Elastic Search in Inventory.
Purpose: The totalRecords search result hit count number returned by RMB is precise if totalRecords is below 1000 and it is only an estimation if totalRecords >= 1000 (details). "about" should be prepended for >= 1000. The number should be rounded for >= 1000. Scenarios:
Algorithm for the front-end: if totalRecordsEstimated is true print "about " + totalRecordsRounded else if totalRecordsEstimated is false print totalRecords // totalRecordsEstimated is undefined until all back-ends have upgraded RMB, // handle this gracefully for folio-testing: else if totalRecords < 1000 print totalRecords else print "about " + totalRecords + " (upgrade back-end for rounding)" Background Discussion:
Details on Rounding Options Considered: 3 rounding proposals are in the comments of MODINVSTOR-468: A) Round to magnitude:
B) Round to first digit:
C) Round first digit to 1, 2 or 5:
This issue is about deciding whether there should be a FOLIO standard for rounding, and if yes, which to choose, and whether the front-end or the back-end should round the number. Proposed API as of July 28, 2020 If the back-end rounds it should also provide to original (non-rounded) estimate, for example
totalRecords: 1431
totalRecordsRounded: 1000
totalRecordsEstimated: true
No rounding is needed for an exact number:
totalRecords: 1509
totalRecordsEstimated: false
https://github.com/folio-org/raml/blob/raml1.0/schemas/resultInfo.schema needs to be extended accordingly. |
| Comments |
| Comment by Cate Boerema (Inactive) [ 18/Jun/20 ] | ||||||||||||
|
Thanks for filing this, Julian Ladisch. I added columns O, P and Q to the result count testing spreadsheet. Could you add a formula to those columns showing what the output of each strategy would result in? It would be helpful in evaluating which is the best. https://docs.google.com/spreadsheets/d/1ergJ7jDHdLbD_noWsL-ZdtVzB2NiiIPaCcUeO4ZFLZw/edit#gid=0 + Charlotte Whitt and Marc Johnson | ||||||||||||
| Comment by Marc Johnson [ 18/Jun/20 ] | ||||||||||||
|
Has it been decided that we are going to round these estimated counts? And we are now trying to decide what rounding approach to take? | ||||||||||||
| Comment by Julian Ladisch [ 18/Jun/20 ] | ||||||||||||
|
Quote from this issues description:
| ||||||||||||
| Comment by Julian Ladisch [ 18/Jun/20 ] | ||||||||||||
|
Cate Boerema I've added number ranges showing the rounding rules to the issue description. Programming a formula takes longer than manually filling the columns. Could you fill the columns? | ||||||||||||
| Comment by Charlotte Whitt [ 18/Jun/20 ] | ||||||||||||
|
Of the three suggested solutions, then I definitely prefer solution B) Round to first digit. | ||||||||||||
| Comment by Charlotte Whitt [ 18/Jun/20 ] | ||||||||||||
I can fill in the columns - np | ||||||||||||
| Comment by Cate Boerema (Inactive) [ 18/Jun/20 ] | ||||||||||||
|
Thank you Julian Ladisch and Charlotte Whitt. Charlotte Whitt, when assessing the methods, please refer to column W which represents the actual count in the DB. Thanks! | ||||||||||||
| Comment by Charlotte Whitt [ 18/Jun/20 ] | ||||||||||||
|
Okay - now done with filling in estimated counts in column N, O, and P for the rows where we have actual count in the DB (see row W) - https://docs.google.com/spreadsheets/d/1ergJ7jDHdLbD_noWsL-ZdtVzB2NiiIPaCcUeO4ZFLZw/edit#gid=0 | ||||||||||||
| Comment by Marc Johnson [ 18/Jun/20 ] | ||||||||||||
|
I had a follow up conversation with Cate Boerema to try to understand what the current situation is. Based upon that I have some follow-up questions. To round or not?My understanding is that it has been decided to round some of the estimated record counts. Does that fit with everyone else's understanding? If so, shall I change the description of this issue to reflect that decision? (I will also need to document this in a decision log, where is yet to be decided) The rest of my comment will assume that this initial decision has been made. Which record counts should be rounded?Does this decision only apply to instance searches within inventory? Does it apply to other records in inventory, e.g. items, loan types or resource types? Or does this apply to any searches where the record counts are estimated? Given that the estimation technique and limitations are specific to RAML Module Builder and may not apply to other parts of FOLIO, if we intend for this to be a general decision, we may need a way to distinguish between estimated and non-estimated counts. How / where is the rounding going to be applied?This may depend upon the scope of this decision. If we intend to be selective about which record counts we want to round, this is likely going to need to be design/compile time configurable, e.g. switched on for mod-inventory-storage / UI inventory, off elsewhere. There are likely trade-offs to where we do this rounding, I shall defer sharing my understanding of those until we have progressed this conversation a little further. What governance is needed for this decision?Jakub Skoczen Craig McNally Zak Burke Is this a significant enough decision that this needs review by the Technical Leads or the Technical Council? (this might depend upon how far reaching this decision is) | ||||||||||||
| Comment by Julian Ladisch [ 18/Jun/20 ] | ||||||||||||
|
Using column W (actual count in DB) for examples might be misleading. When implemented the rounding is applied to the estimated count, not the actual count in DB. Half of the cases are rounded towards the actual count in DB and the other half are rounded away. | ||||||||||||
| Comment by Charlotte Whitt [ 18/Jun/20 ] | ||||||||||||
|
Oh ... (sigh) I can do it all one more time ... when I'm out of meetings. | ||||||||||||
| Comment by Cate Boerema (Inactive) [ 19/Jun/20 ] | ||||||||||||
Yeah, but we want to see how the displayed result counts compare to the actuals. That's what matters. I know that, in some cases it will make them the displayed count less accurate. I think it is very important that we look at how often that happens and how big of a problem it is. Hence, we should compare what will be displayed to the actuals. | ||||||||||||
| Comment by Cate Boerema (Inactive) [ 19/Jun/20 ] | ||||||||||||
|
Oh, I see what Charlotte Whitt did. Yeah, I didn't mean to apply the rounding rules to the actual count. I meant to compare the results of the various rounding rules as applied to the raw result count (columns O,P and Q) to the actual counts (now in column Z (was column W)). | ||||||||||||
| Comment by Cate Boerema (Inactive) [ 27/Jul/20 ] | ||||||||||||
|
We want to do this in Q3, so we need to decide: Charlotte Whitt and I can work on an answer to the first question (e.g. actuals aren't estimated, estimates are and they should be estimated as follows...). Tech folks (Marc Johnson, Julian Ladisch, Zak Burke, Craig McNally) should decide where. Also need a decision on what the intended scope is for Q3 (I'm currently thinking it's okay if we only manage to do this in Inventory even if it means inconsistency with rest of FOLIO). | ||||||||||||
| Comment by Zak Burke [ 28/Jul/20 ] | ||||||||||||
|
It doesn't much matter to me where rounding or estimation happens, i.e. if the backend or frontend is responsible for implementing the algorithm, or how to round the values, which seems like the domain of a SIG. What matters to me as a developer, and I suspect to anybody using the API directly, is knowing when a value reflects an exact count, when a value reflects an estimated count, and when a value has been rounded. Confusion about the meaning of totalRecords has led to a lot of heartache (
Personally, I would like to see totalRecords deprecated, or at least supplemented, by a set of values such as exactCount and/or estimatedCount and/or roundedCount so the consumer of that value (whether a person using Postman or a UI like stripes) can make an intelligent decision about what to do based on what kind of value is available. | ||||||||||||
| Comment by Julian Ladisch [ 28/Jul/20 ] | ||||||||||||
|
I agree, this is out of scope of this Jira and should be discussed in other Jiras:
| ||||||||||||
| Comment by Cate Boerema (Inactive) [ 28/Jul/20 ] | ||||||||||||
|
Julian Ladisch where should we have a conversation about whether the rounding is done on the frontend or the backend? Neither of the two tickets you linked seem like they concern that question. | ||||||||||||
| Comment by Zak Burke [ 28/Jul/20 ] | ||||||||||||
|
I don't have strong feelings about where rounding is implemented. RMB seems like an easy place to implement rounding on the backend, but that means that every module that wants rounding needs to update its RMB version, and we'll have to make some front-end updates to find that new property. Given that, maybe rounding is best handled in the front-end, because all UI modules share a common stripes-connect version and since we have to handle it there anyway, we might as well only handle it there. | ||||||||||||
| Comment by Marc Johnson [ 28/Jul/20 ] | ||||||||||||
|
The questions that Cate Boerema is asking came about from a conversation her and I had. Julian Ladisch I agree that the details of the API response representation for indicating whether the count is an estimate or not is best discussed on
I think this issue is a good place (unless someone wants to write a design proposal, which might be what we should do) to discuss the overall decisions about rounding. I think those questions can be separated into organisational policy (these might influence the technical design) and technical design. Organisational policy
Technical Design
Cate Boerema Charlotte Whitt The question about which counts should be rounded is especially imported and might influence technical decisions. For example, if we decide that some clients e.g. discovery want to apply their own rounding rules (or not round at all), then that either means the back-end should not round at all, or whether it does or not should be controlled by the client. (I think the questions I asked from my previous comment mostly overlap with this, however they might be useful for reference) cc: Craig McNally | ||||||||||||
| Comment by Cate Boerema (Inactive) [ 28/Jul/20 ] | ||||||||||||
|
Thanks for your thoughts, Zak Burke. I've done a bit of analysis on the relative accuracy of the different rounding methods. See columns R, S and T here: https://docs.google.com/spreadsheets/d/1ergJ7jDHdLbD_noWsL-ZdtVzB2NiiIPaCcUeO4ZFLZw/edit#gid=0 I compared the estimates that would come from each rounding method (Julian Ladisch provided these in columns O, P and Q ) with the actuals from the DB (column Z). The "Rounded to first digit" method was the closest to actual more often than the other methods (17 times vs 14 for "Rounded first digit to 1, 2 or 5" and 9 for "Rounded to magnitude"). I know this is a relatively small sample size, but I think it's probably good enough. Does anyone have any concerns with this methodology? Assuming no concerns, I think the business requirements should be something like the following:
Thoughts Charlotte Whitt, Marc Johnson, Julian Ladisch? | ||||||||||||
| Comment by Marc Johnson [ 28/Jul/20 ] | ||||||||||||
|
In general I think this is ok. I believe some of the outstanding issues around this area are for folks expectations around specific searches. How does this fit in with expectations around those? For example, when searching for suppressed by discovery yes or no? If we assume that the estimated matching records before applying those filters is less that 10 000, then the margin for cumulative error on the matching record is nearly 1000 (998 I think), in the sense that we might get a count from the two searches using the suppress from discovery filter that have a sum of maybe 1000 different from the total given without the filter. Julian Ladisch please do correct my logic if I've misunderstood or over simplified this example (I'm mostly ignoring the error in the underlying estimates). | ||||||||||||
| Comment by Cate Boerema (Inactive) [ 28/Jul/20 ] | ||||||||||||
|
Given the result counts are estimates, I would not expect searching for things like suppressed for discovery y/n to add up. SMEs would, of course, love for all result counts to be exact and for everything to add up precisely, but my understanding is that simply isn't possible until we have Elastic search (and, even then, it's not clear it will be as precise as people would like). Still, given what we have, I believe that appending the "about" and rounding the estimates will help users to better understand the situation. I wish the estimates could be more accurate (some are still quite a bit off, percentage-wise) but we need to work with what we have. Unless you think there is something additional we can do now to help with this? | ||||||||||||
| Comment by Cate Boerema (Inactive) [ 28/Jul/20 ] | ||||||||||||
Marc Johnson I wouldn't think discovery systems would rely on FOLIO for result counts at all. But Craig McNally may have some insight into whether this kind of data would be used by discovery or other integrations. | ||||||||||||
| Comment by Marc Johnson [ 28/Jul/20 ] | ||||||||||||
Where would you expect discovery systems to get the information from? As I understand it, the OAI-PMH and RTAC modules are being used by discovery systems, is that not the case? cc: Matt Reno | ||||||||||||
| Comment by Charlotte Whitt [ 28/Jul/20 ] | ||||||||||||
|
Hi Marc Johnson, Cate Boerema is right, discovery systems do not rely on FOLIO search result count. E.g. has Ian Hardy when setting up VuFind for the Simmons library, elegantly implemented facetted search - see: | ||||||||||||
| Comment by Charlotte Whitt [ 28/Jul/20 ] | ||||||||||||
I agree with your comments above Cate Boerema. When would we know if Elastic Search would be a much better option, and be much more precise than what we have today? CC: lew235 Felix Hemme | ||||||||||||
| Comment by Charlotte Whitt [ 28/Jul/20 ] | ||||||||||||
|
Sorry - I'm working my way backwards on today's comments in this thread. But to your suggestion above Cate Boerema - https://folio-org.atlassian.net/browse/FOLIO-2648?focusedCommentId=14940&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel - then I agree. Using result count to be rounded using the "Rounded to first digit" method - is aligned with my comment on 6/18/2020 - https://folio-org.atlassian.net/browse/FOLIO-2648?focusedCommentId=14890&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel CC: lew235 | ||||||||||||
| Comment by Marc Johnson [ 28/Jul/20 ] | ||||||||||||
Does that mean that we are comfortable assuming that no external systems are going to use FOLIOs record counts and so the only audience for this change is the internal, staff oriented reference UI? | ||||||||||||
| Comment by Matt Reno [ 28/Jul/20 ] | ||||||||||||
|
Marc Johnson the current RTAC implementation in FOLIO, which is used by EDS, does not use or report record counts. The API returns RTAC info for a specified instance ID and we retrieve with limit set to Integer.MAX_VALUE instead of paging. However, we are using counts for the EBSCO Analytics project in order to extract data from FOLIO and possibly for stats. If the counts are not accurate, then we may not be retrieving everything as expected. I guess we'll need to be aware of this when paging and maybe come up with another approach to know when we are done retrieving all pages. | ||||||||||||
| Comment by Zak Burke [ 28/Jul/20 ] | ||||||||||||
Discussion at SUP-8; we face the same dilemma in the UI WRT how to handle paging. | ||||||||||||
| Comment by Cate Boerema (Inactive) [ 28/Jul/20 ] | ||||||||||||
Charlotte Whitt, my understanding is that the Shanghai folks are working on a proof of concept for Elastic search in Q3. I hope we will know more about result count precision when that is available. FYI twliu | ||||||||||||
| Comment by Cate Boerema (Inactive) [ 28/Jul/20 ] | ||||||||||||
|
I just restructured the description of this issue to include the scenarios. I'm thinking we can just use this as our development story now (unless this actually spins off several stories). Thoughts? BTW,when I was reorganizing the content, I noticed that, in the last sentence of the description, Julian wrote "This issue is about deciding whether there should be a FOLIO standard for rounding, and if yes, which to choose. If the decision has been made Stripes should provide a round function to be used by all front-end modules." Another vote for doing this on the frontend, I guess. | ||||||||||||
| Comment by Marc Johnson [ 28/Jul/20 ] | ||||||||||||
I think if it has been decided that we should do the rounding on the front end, then I think Zak Burke is probably best placed to outline what would be involved in achieving that. I think it is likely that any work be blocked on both
I think we should be aware that this means that any client that is not the reference UI might present different information from the same searches. (I'd like someone like Craig McNally VBar Zak Burke Mike Gorrell to maybe comment here as to whether we think this kind of decision needs any technical governance from the Technical Council or Leads) | ||||||||||||
| Comment by Julian Ladisch [ 28/Jul/20 ] | ||||||||||||
|
I no longer advocate for front-end rounding (and removed my sentence from the issue description that the front-end should round).
This is not a strong advantage of back-end rounding so I will also be happy with front-end rounding. The back-end APIs are designed to be used by external software and will be used by external software. If the back-end rounds it should also provide to original (non-rounded) estimate, for example
totalRecords: 1431
totalRecordsRounded: 1000
totalRecordsEstimated: true
No rounding is needed for an exact number:
totalRecords: 1509
totalRecordsEstimated: false
(Q3 2020 will have "totalRecordsEstimated" flag
| ||||||||||||
| Comment by Marc Johnson [ 28/Jul/20 ] | ||||||||||||
I agree that this is the most important aspect of the back end vs. front end technical decision. My questions about external systems like discovery were to try to explore whether this consistency was desirable, or if it might even be problematic.
This is interesting, because it would allow and require a client to make the choice. If we chose this design, there would need to be front end work as well. I'll admit I was premature in assuming the client would be ignorant of the rounding and they would only receive an already rounded estimate. | ||||||||||||
| Comment by Mike Gorrell [ 28/Jul/20 ] | ||||||||||||
|
My take would be to do it on the backend for the reasons Julian stated (consistency internally or externally). And it feels like a core team decision to me. | ||||||||||||
| Comment by Julian Ladisch [ 29/Jul/20 ] | ||||||||||||
|
Marc Johnson We cannot avoid front-end work. We need at least this logic: if totalRecordsEstimated is true print "about " + totalRecordsRounded else if totalRecordsEstimated is false print totalRecords // totalRecordsEstimated is undefined until all back-ends have upgraded RMB, // handle this gracefully for folio-testing: else if totalRecords < 1000 print totalRecords else print "about " + totalRecords + " (upgrade back-end for rounding)" | ||||||||||||
| Comment by Cate Boerema (Inactive) [ 29/Jul/20 ] | ||||||||||||
|
Thanks guys! It sounds like there will be several stories needed, so this one should probably be converted to a UXPROD feature, rather than a dev story. Are these the stories that would define the feature? Who can tell me if it's possible to get the "totalRecordsRounded" done in Q3?
| ||||||||||||
| Comment by Julian Ladisch [ 29/Jul/20 ] | ||||||||||||
|
Yes, Q3 2020 is possible,
| ||||||||||||
| Comment by Marc Johnson [ 29/Jul/20 ] | ||||||||||||
Does that mean that it has been decided that the rounding will be done by the back end? | ||||||||||||
| Comment by Julian Ladisch [ 29/Jul/20 ] | ||||||||||||
|
No, but
| ||||||||||||
| Comment by Marc Johnson [ 29/Jul/20 ] | ||||||||||||
If they need to be worked on at the same time, does that mean that work on informing the client whether the total record count is an estimate or not is blocked upon a decision as to where to do the rounding? | ||||||||||||
| Comment by Cate Boerema (Inactive) [ 29/Jul/20 ] | ||||||||||||
Marc Johnson do you have an objection to doing it on the back end? If not, let's just do that since Julian Ladisch and Mike Gorrell prefer that approach. Zak Burke has said it doesn't matter to him either way. | ||||||||||||
| Comment by Marc Johnson [ 29/Jul/20 ] | ||||||||||||
I don't have any objections to this. I think it is important that we are confident about the need for consistency. If we choose to make this opaque to clients by using the same property (totalRecords) then clients have no choice and so it will be consistent. If we choose to have separate properties for rounded and non-rounded values then the client may choose and so could be inconsistent. Does that make sense? | ||||||||||||
| Comment by Julian Ladisch [ 29/Jul/20 ] | ||||||||||||
|
This cannot be opaque to the client because the client needs to check the totalRecordsEstimated property to decide whether to add "about " or the translation of "about " to the number. | ||||||||||||
| Comment by Marc Johnson [ 29/Jul/20 ] | ||||||||||||
Apologies for being vague with my scope, I was referring only to the rounding aspect of this, not whether the record count is estimated or not (which I believe is already being worked on and I consider somewhat separate) | ||||||||||||
| Comment by Marc Johnson [ 10/Aug/20 ] | ||||||||||||
|
Has a decision been made about where FOLIO is going to round the record counts and what impact that has on clients? | ||||||||||||
| Comment by Marc Johnson [ 11/Aug/20 ] | ||||||||||||
It has been expressed by folks that an important aspect of doing this work on the back end is consistency. I think we need to decide what we mean by consistency. The proposed API design for this rounding distinguishes between rounded and not rounded totals (by including two properties totalRecords and totalRecordsRounded). This means that it is transparent to the client and which value is used it at the discretion of the individual client. Some clients may choose to present the rounded number, others the unrounded and others may choose to apply their own rounding to the unrounded number. I had thought of consistency to mean that we expected the same value to be presented for the same question irrespective of the client. It is entirely possible I misunderstood the use in this context. What degree of consistency are folks wanting to achieve? | ||||||||||||
| Comment by Julian Ladisch [ 11/Aug/20 ] | ||||||||||||
|
If the clients should do the rounding we need to duplicate the rounding code for each client. Some clients may implement a different rounding algorithm (see A, B, and C in this issue desciption), or may incorrectly implement the correct rounding algorithm; this results in different numbers for the same query = inconsisteny. Some clients may implement a minimal viable product without any rounding to avoid the rounding code. If the rounded value is provided by the back-end it is more likely that the developer uses it. Improving the rounding algorithm requires to update all clients if rounding is client-side, but does not require any client change if the rounded value is provided by the back-end. | ||||||||||||
| Comment by Marc Johnson [ 12/Aug/20 ] | ||||||||||||
That way well be the case.
An inherent part of the proposed design is that the clients can choose between the existing totalRecords and the new totalRecordsRounded that means that there is work for any existing client to support rounding. It also means that clients can choose to implement their own rounding algorithm at any point in the future. | ||||||||||||
| Comment by Cate Boerema (Inactive) [ 17/Aug/20 ] | ||||||||||||
|
Hi guys. Based on Julian Ladisch's comments, I converted this to a UXPROD feature and assigned it to Core Platform for Q3 2020. I linked 3 RMB issues as defining this feature (this feature won't be done until all 3 of those issues are complete). Are there additional stories needed? Do we need something for the front end to make use of this new data? Is Core Platform still tracking to have this done for Q3? Thanks much! FYI Jakub Skoczen and Charlotte Whitt | ||||||||||||
| Comment by Marc Johnson [ 17/Aug/20 ] | ||||||||||||
Assuming we are going with the approach where the back-end introduces a new property (totalRecordsRounded) then yes, we will need UI issues for using this property (at their discretion) rather than totalRecords. As the rollout of this will likely be incremental across the various modules and API endpoints, I think we likely need input from Zak Burke or Michal Kuklis to advise on whether this can be done in a centralised way in stripes (based upon the presence of properties) or whether it needs to be done for each record type in turn. It's worth keeping in mind that some endpoints may never provide totalRecordsRounded when we think about how to do this. Depending upon how the change is done in RAML Module Builder, then we might also need issues in the various back end modules to expose this in the APIs. | ||||||||||||
| Comment by Zak Burke [ 17/Aug/20 ] | ||||||||||||
|
Cate Boerema, Marc Johnson is correct that we will need stories across most front-end app modules, as well as stripes-components and stripes-smart-components. There are ~70 occurrences that will need to be investigated in total-core.txt | ||||||||||||
| Comment by Charlotte Whitt [ 17/Aug/20 ] | ||||||||||||
|
As I understand it, then at the PC meeting last week (https://docs.google.com/document/d/1MpDJGX5wPzJZtqf4bSOge1D9clLLhKIqhIqqTDD87gA/edit) Lucy Liu told that the Shanghai Library have come far with their investigation of Elastic Search, and they expect to be able to present the result in Q3 2020 - See more in: https://folio-org.atlassian.net/browse/UXPROD-2592 | ||||||||||||
| Comment by Cate Boerema (Inactive) [ 17/Aug/20 ] | ||||||||||||
Charlotte Whitt, I still think it's prudent for us to progress on this, as ES will take a while. Shanghai is starting by just implementing infrastructure in Q3. It will not be wired into actual FOLIO apps until Q4 at the soonest.
Zak Burke, that's a lot. I hope we can take a phased approach to this development, starting with Inventory. What is the best way to plan this work? Should we discuss in a Core Functional grooming meeting or schedule a discussion with a smaller group? | ||||||||||||
| Comment by Zak Burke [ 17/Aug/20 ] | ||||||||||||
|
Cate Boerema, I think we could start by grooming stories in stripes-connect and stripes-smart-components. That will address the search-and-sort component family, and thus take care of a big slice of the most important pieces of the UI across many apps. Looking at those lists in more detail, there are many "if totalRecords > 0" like statements, which mean there is no work to do there. So, my estimate may have been a vast-over-estimate, which would be nice. In any case, yes, let's do the remainder one app at a time and start with inventory. | ||||||||||||
| Comment by Adam Dickmeiss [ 19/Aug/20 ] | ||||||||||||
|
I know what totalRecordsRounded is supposed to do.. Give some comfort! But really.. This is something the UI should do.. Not to mention that some rounded value gives some false sense of upper and lower bound.. Eg that something is within < 100 of of real hit or < 1000 within some hit count.. Eg if the value returned is 30000 it could be 47000 or if the value is 1700 it could be 17000. | ||||||||||||
| Comment by Julian Ladisch [ 19/Aug/20 ] | ||||||||||||
|
Note that in the current proposal the back-end passes both the original estimation and the rounded estimation to the clients. The clients can decide whether to show the original estimation or the rounded estimation (they may even do some own rounding based on the original estimation). If the exact value is 47000 and the estimated value is 25088 or 34088 then showing the rounded value of 30000 is better then showing 25088 or 34088 because it doesn't give the false sense of precision. | ||||||||||||
| Comment by Adam Dickmeiss [ 19/Aug/20 ] | ||||||||||||
|
I don't it's backend's business to return a rounded number when it is not based on lower and upper bound. If a UI wants to show "thousands of hits" for an estimated hit count > 1000 or whether to show 34000 is up to the UI team. We will continue to get bug reports about this even with a rounded number because it will give the impression that there's some reality to the value.. There's not at the moment. | ||||||||||||
| Comment by Julian Ladisch [ 20/Aug/20 ] | ||||||||||||
|
Assume these number returned from the back-end:
totalRecords: 32351
totalRecordsRounded: 30000
totalRecordsEstimated: true
This issue proposes to display it FOLIO as "about 30,000 records found". | ||||||||||||
| Comment by Adam Dickmeiss [ 20/Aug/20 ] | ||||||||||||
|
I don't think the backend should return a totalRecordsRounded entity. I think that with the totalRecords and totalRecordsEstimated, the UI can do whatever it pleases. | ||||||||||||
| Comment by Julian Ladisch [ 20/Aug/20 ] | ||||||||||||
|
Adam Dickmeiss What should the UI display for this value?
totalRecords: 32351
totalRecordsEstimated: true
| ||||||||||||
| Comment by Adam Dickmeiss [ 20/Aug/20 ] | ||||||||||||
|
The UI could show 30000. Another UI could leave it out.. Clients that are not UIs won't care. Things like this should not be coupled between modules. I don't care so much what the UI does. What I think is wrong is to leave it to the backend to do UI work. | ||||||||||||
| Comment by Julian Ladisch [ 20/Aug/20 ] | ||||||||||||
|
If we change the sample size of a PostgreSQL table the accuracy of the estimation changes. The back-end can then switch to a more appropriate rounding type (A, B or C as presented in this issue's description, or some other). Do you have proposal how to indicate to the UI that the accuracy has changed? | ||||||||||||
| Comment by Adam Dickmeiss [ 20/Aug/20 ] | ||||||||||||
|
What if we see that we can actually have a lower and upper bound later, what then? Bottomline is that at THIS stage the estimated number is the closest to reality that the backed can provide. | ||||||||||||
| Comment by Adam Dickmeiss [ 20/Aug/20 ] | ||||||||||||
|
https://lucene.apache.org/solr/guide/8_6/common-query-parameters.html | ||||||||||||
| Comment by Julian Ladisch [ 31/Aug/20 ] | ||||||||||||
|
Thank you for the link.
shows that our current FOLIO proposal works the same as Solr:
| ||||||||||||
| Comment by Debra Howell [ 21/Sep/20 ] | ||||||||||||
|
If elastic search is implemented, this will not be as important.
| ||||||||||||
| Comment by Julian Ladisch [ 02/Oct/20 ] | ||||||||||||
|
Cate Boerema This is the algorithm for the front-end: if totalRecordsEstimated is true print "about " + totalRecordsRounded else if totalRecordsEstimated is false print totalRecords // totalRecordsEstimated is undefined until all back-ends have upgraded RMB, // handle this gracefully for folio-testing: else if totalRecords < 1000 print totalRecords else print "about " + totalRecords + " (upgrade back-end for rounding)" Can you update the issue description of
| ||||||||||||
| Comment by Cate Boerema (Inactive) [ 05/Oct/20 ] | ||||||||||||
|
Yes, I will. Thanks Julian Ladisch | ||||||||||||
| Comment by Cate Boerema (Inactive) [ 05/Nov/20 ] | ||||||||||||
|
Jakub Skoczen I removed the R1 2021 fix version from this issue per our discussion in the platform roadmap meetings. We haven't made it that far with this and it's best at this point to hold off and wait for Elastic search. | ||||||||||||
| Comment by Charlotte Whitt [ 26/Feb/21 ] | ||||||||||||
|
Hi Jakub Skoczen - The libraries are asked to do another round of feature ranking using 100 points. While this feature is in an undecided state, due to awaiting for the outcome of the POC of using Elastic Search in Inventory. I have therefore changed this feature to Draft, which I hope is okay by you? | ||||||||||||
| Comment by Charlotte Whitt [ 11/Aug/21 ] | ||||||||||||
|
Hi Jakub Skoczen - With the decision to move ahead and implement Elastic search (
| ||||||||||||
| Comment by Julian Ladisch [ 11/Aug/21 ] | ||||||||||||
|
I don't agree because there are quite a few back-end processes that need totalRecords and cannot use Elastic Search because of the delay changes are propagated from PostgreSQL to Elastic Search. |