[UXPROD-2623] Wait for POC of Elastic Search - Round estimated search result hit count (totalRecords) Created: 17/Jun/20  Updated: 11/Aug/21  Resolved: 11/Aug/21

Status: Closed
Project: UX Product
Components: None
Affects versions: None
Fix versions: None

Type: New Feature Priority: TBD
Reporter: Julian Ladisch Assignee: Jakub Skoczen
Resolution: Won't Do Votes: 0
Labels: NFR, elastic-search, q3-2020-spillover, result-count, search, searching
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: PNG File Skärmavbild 2020-08-17 kl. 15.55.00.png     Text File total-complete-only.txt     Text File total-core.txt    
Issue links:
Blocks
blocks UXPROD-2695 Display Rounded Result Counts in Apps... Closed
blocks UXPROD-2702 Display Rounded Result Counts in Apps... Closed
is blocked by UXPROD-2806 Create Elasticsearch indexes for Inve... Closed
Defines
is defined by RMB-695 Add totalRecordsEstimated and totalRe... Closed
is defined by RMB-578 Inform whether totalRecords is exact ... Blocked
is defined by RMB-685 totalRecordsRounded Blocked
Duplicate
is duplicated by UIU-1741 Changing patron group for one user do... Closed
Relates
relates to MODINVSTOR-468 hitcount related issues in Q1 Closed
relates to UXPROD-2369 Wait for POC of Elastic Search - Impl... Closed
Development Team: Core: Platform
PO Ranking Note: CW: This feature is in an undecided state, due to awaiting the outcome of the POC of using Elastic Search in Inventory.
Rank: Chicago (MVP Sum 2020): R2
Rank: Cornell (Full Sum 2021): R1
Rank: Duke (Full Sum 2021): R1
Rank: TAMU (MVP Jan 2021): R2

 Description   

CW: This feature is in an undecided state, due to awaiting the outcome of the POC of using Elastic Search in Inventory. 

  • - - -

Purpose:

The totalRecords search result hit count number returned by RMB is precise if totalRecords is below 1000 and it is only an estimation if totalRecords >= 1000 (details).
The front-end currently displays "8081 records found" for estimations.
This is misleading (false precision).

"about" should be prepended for >= 1000.

The number should be rounded for >= 1000.

Scenarios:

  1. Scenario
    • Given a result count is estimated (current understanding is that result counts above 1,000 are estimates)
    • When displayed in FOLIO
    • Then:
      • Result count should be preceded by "about"
      • Result count should be rounded using the "Rounded to first digit" method
      • For example:
        • Estimated result count from Postgres = 32351
        • Displays in FOLIO as "about 30,000 records found"
  2. Scenario
    • Given a result count is NOT estimated (current understanding is that result counts under 1,000 are exact)
    • When displayed in FOLIO
    • Then:
      • Result count should NOT be preceded by "about"
      • Result count should NOT be rounded
      • For example:
        • Exact result count from Postgres = 845
        • Displays in FOLIO as "845 records found"

Algorithm for the front-end:

if totalRecordsEstimated is true
  print "about " + totalRecordsRounded
else if totalRecordsEstimated is false
  print totalRecords
// totalRecordsEstimated is undefined until all back-ends have upgraded RMB,
// handle this gracefully for folio-testing:
else if totalRecords < 1000
  print totalRecords
else
  print "about " + totalRecords + " (upgrade back-end for rounding)"

Background Discussion:

  • Three rounding proposals were considered
  • Option B, "Round to first digit" was selected

Details on Rounding Options Considered:

3 rounding proposals are in the comments of MODINVSTOR-468:

A) Round to magnitude:

about 10,000 (3,000-29,999)
about 100,000 (30,000-299,999)
about 1,000,000 (300,000-2,999,999)
about 10,000,000 (3,000,000-29,999,999)

B) Round to first digit:

about 2,000 (1,500-2,499)
about 3,000 (2,500-3,499)
about 4,000 (3,500-4,499)
about 5,000 (4,500-5,499)
about 6,000 (5,500-6,499)
about 7,000 (6,500-7,499)
about 8,000 (7,500-8,499)
about 9,000 (8,500-9,499)
about 10,000 (9,500-14,999)
about 20,000 (15,000-24,999)
about 30,000 (25,000-34,999)

C) Round first digit to 1, 2 or 5:

about 2,000 (1,500-3,999)
about 5,000 (4,000-7,999)
about 10,000 (8,000-14,999)
about 20,000 (15,000-39,999)
about 50,000 (40,000-79,999)

This issue is about deciding whether there should be a FOLIO standard for rounding, and if yes, which to choose, and whether the front-end or the back-end should round the number.

Proposed API as of July 28, 2020

If the back-end rounds it should also provide to original (non-rounded) estimate, for example

totalRecords: 1431
totalRecordsRounded: 1000
totalRecordsEstimated: true

No rounding is needed for an exact number:

totalRecords: 1509
totalRecordsEstimated: false

https://github.com/folio-org/raml/blob/raml1.0/schemas/resultInfo.schema needs to be extended accordingly.



 Comments   
Comment by Cate Boerema (Inactive) [ 18/Jun/20 ]

Thanks for filing this, Julian Ladisch. I added columns O, P and Q to the result count testing spreadsheet. Could you add a formula to those columns showing what the output of each strategy would result in? It would be helpful in evaluating which is the best.

https://docs.google.com/spreadsheets/d/1ergJ7jDHdLbD_noWsL-ZdtVzB2NiiIPaCcUeO4ZFLZw/edit#gid=0

+ Charlotte Whitt and Marc Johnson

Comment by Marc Johnson [ 18/Jun/20 ]

Cate Boerema Jakub Skoczen

Has it been decided that we are going to round these estimated counts? And we are now trying to decide what rounding approach to take?

Comment by Julian Ladisch [ 18/Jun/20 ]

Quote from this issues description:

This issue is about deciding whether there should be a FOLIO standard for rounding, and if yes, which to choose.

Comment by Julian Ladisch [ 18/Jun/20 ]

Cate Boerema I've added number ranges showing the rounding rules to the issue description. Programming a formula takes longer than manually filling the columns. Could you fill the columns?

Comment by Charlotte Whitt [ 18/Jun/20 ]

Of the three suggested solutions, then I definitely prefer solution B) Round to first digit.
I can ask the MM-SIG at today's meeting - if that brings us forward in making a decision.

CC: Cate Boerema Julian Ladisch Marc Johnson

Comment by Charlotte Whitt [ 18/Jun/20 ]

Julian Ladisch

Could you fill the columns?

I can fill in the columns - np

Comment by Cate Boerema (Inactive) [ 18/Jun/20 ]

Thank you Julian Ladisch and Charlotte Whitt.

Charlotte Whitt, when assessing the methods, please refer to column W which represents the actual count in the DB.

Thanks!

Comment by Charlotte Whitt [ 18/Jun/20 ]

Okay - now done with filling in estimated counts in column N, O, and P for the rows where we have actual count in the DB (see row W) - https://docs.google.com/spreadsheets/d/1ergJ7jDHdLbD_noWsL-ZdtVzB2NiiIPaCcUeO4ZFLZw/edit#gid=0

Comment by Marc Johnson [ 18/Jun/20 ]

I had a follow up conversation with Cate Boerema to try to understand what the current situation is. Based upon that I have some follow-up questions.

To round or not?

My understanding is that it has been decided to round some of the estimated record counts. Does that fit with everyone else's understanding?

If so, shall I change the description of this issue to reflect that decision? (I will also need to document this in a decision log, where is yet to be decided)

The rest of my comment will assume that this initial decision has been made.

Which record counts should be rounded?

Does this decision only apply to instance searches within inventory?

Does it apply to other records in inventory, e.g. items, loan types or resource types?

Or does this apply to any searches where the record counts are estimated?

Given that the estimation technique and limitations are specific to RAML Module Builder and may not apply to other parts of FOLIO, if we intend for this to be a general decision, we may need a way to distinguish between estimated and non-estimated counts.

How / where is the rounding going to be applied?

This may depend upon the scope of this decision. If we intend to be selective about which record counts we want to round, this is likely going to need to be design/compile time configurable, e.g. switched on for mod-inventory-storage / UI inventory, off elsewhere.

There are likely trade-offs to where we do this rounding, I shall defer sharing my understanding of those until we have progressed this conversation a little further.

What governance is needed for this decision?

Jakub Skoczen Craig McNally Zak Burke Is this a significant enough decision that this needs review by the Technical Leads or the Technical Council? (this might depend upon how far reaching this decision is)

Comment by Julian Ladisch [ 18/Jun/20 ]

Using column W (actual count in DB) for examples might be misleading. When implemented the rounding is applied to the estimated count, not the actual count in DB. Half of the cases are rounded towards the actual count in DB and the other half are rounded away.

Comment by Charlotte Whitt [ 18/Jun/20 ]

Oh ... (sigh)

I can do it all one more time ... when I'm out of meetings.
Thanks for clarifying this Julian Ladisch

Comment by Cate Boerema (Inactive) [ 19/Jun/20 ]

Using column W (actual count in DB) for examples might be misleading. When implemented the rounding is applied to the estimated count, not the actual count in DB. Half of the cases are rounded towards the actual count in DB and the other half are rounded away.

Yeah, but we want to see how the displayed result counts compare to the actuals. That's what matters. I know that, in some cases it will make them the displayed count less accurate. I think it is very important that we look at how often that happens and how big of a problem it is. Hence, we should compare what will be displayed to the actuals.

Comment by Cate Boerema (Inactive) [ 19/Jun/20 ]

Oh, I see what Charlotte Whitt did. Yeah, I didn't mean to apply the rounding rules to the actual count. I meant to compare the results of the various rounding rules as applied to the raw result count (columns O,P and Q) to the actual counts (now in column Z (was column W)).

Comment by Cate Boerema (Inactive) [ 27/Jul/20 ]

We want to do this in Q3, so we need to decide:
1. What do we want to round and how
2. Where do we want to do it (front end vs back end)

Charlotte Whitt and I can work on an answer to the first question (e.g. actuals aren't estimated, estimates are and they should be estimated as follows...). Tech folks (Marc Johnson, Julian Ladisch, Zak Burke, Craig McNally) should decide where.

Also need a decision on what the intended scope is for Q3 (I'm currently thinking it's okay if we only manage to do this in Inventory even if it means inconsistency with rest of FOLIO).

Comment by Zak Burke [ 28/Jul/20 ]

It doesn't much matter to me where rounding or estimation happens, i.e. if the backend or frontend is responsible for implementing the algorithm, or how to round the values, which seems like the domain of a SIG. What matters to me as a developer, and I suspect to anybody using the API directly, is knowing when a value reflects an exact count, when a value reflects an estimated count, and when a value has been rounded. Confusion about the meaning of totalRecords has led to a lot of heartache ( RMB-684 Closed , MODINVSTOR-321 Closed , STSMACOM-259 Closed , UIIN-1055 Closed , UIIN-1071 Closed , etc).

Personally, I would like to see totalRecords deprecated, or at least supplemented, by a set of values such as exactCount and/or estimatedCount and/or roundedCount so the consumer of that value (whether a person using Postman or a UI like stripes) can make an intelligent decision about what to do based on what kind of value is available.

Comment by Julian Ladisch [ 28/Jul/20 ]

I agree, this is out of scope of this Jira and should be discussed in other Jiras:

  • RMB-578 Blocked Inform whether totalRecords is exact or an estimate
  • UXPROD-2369/RMB-673 Implement exact hit count
Comment by Cate Boerema (Inactive) [ 28/Jul/20 ]

Julian Ladisch where should we have a conversation about whether the rounding is done on the frontend or the backend? Neither of the two tickets you linked seem like they concern that question.

Comment by Zak Burke [ 28/Jul/20 ]

I don't have strong feelings about where rounding is implemented.

RMB seems like an easy place to implement rounding on the backend, but that means that every module that wants rounding needs to update its RMB version, and we'll have to make some front-end updates to find that new property. Given that, maybe rounding is best handled in the front-end, because all UI modules share a common stripes-connect version and since we have to handle it there anyway, we might as well only handle it there.

Comment by Marc Johnson [ 28/Jul/20 ]

The questions that Cate Boerema is asking came about from a conversation her and I had.

Julian Ladisch I agree that the details of the API response representation for indicating whether the count is an estimate or not is best discussed on RMB-578 Blocked

I think this issue is a good place (unless someone wants to write a design proposal, which might be what we should do) to discuss the overall decisions about rounding.

I think those questions can be separated into organisational policy (these might influence the technical design) and technical design.

Organisational policy

  • Should all estimated counts be rounded e.g. should only the counts in the staff reference UI be estimated, or should the rounding also apply to discovery systems or other clients?
  • What rules should be used to do the rounding? Are these the same everywhere, e.g. inventory vs. acquisitions, staff vs. discovery views, instances vs. locations?

Technical Design

  • Should the rounding be done in the reference back end or reference front end modules?

Cate Boerema Charlotte Whitt The question about which counts should be rounded is especially imported and might influence technical decisions.

For example, if we decide that some clients e.g. discovery want to apply their own rounding rules (or not round at all), then that either means the back-end should not round at all, or whether it does or not should be controlled by the client.

(I think the questions I asked from my previous comment mostly overlap with this, however they might be useful for reference)

cc: Craig McNally

Comment by Cate Boerema (Inactive) [ 28/Jul/20 ]

Thanks for your thoughts, Zak Burke.

I've done a bit of analysis on the relative accuracy of the different rounding methods. See columns R, S and T here: https://docs.google.com/spreadsheets/d/1ergJ7jDHdLbD_noWsL-ZdtVzB2NiiIPaCcUeO4ZFLZw/edit#gid=0 I compared the estimates that would come from each rounding method (Julian Ladisch provided these in columns O, P and Q ) with the actuals from the DB (column Z). The "Rounded to first digit" method was the closest to actual more often than the other methods (17 times vs 14 for "Rounded first digit to 1, 2 or 5" and 9 for "Rounded to magnitude"). I know this is a relatively small sample size, but I think it's probably good enough.

Does anyone have any concerns with this methodology?

Assuming no concerns, I think the business requirements should be something like the following:

  1. Scenario
    • Given a result count is estimated (current understanding is that result counts above 1,000 are estimates)
    • When displayed in FOLIO
    • Then:
      • Result count should be preceded by "about"
      • Result count should be rounded using the "Rounded to first digit" method
      • For example:
        • Estimated result count from Postgres = 32351
        • Displays in FOLIO as "about 30,000 records found"
  2. Scenario
    • Given a result count is NOT estimated (current understanding is that result counts under 1,000 are exact)
    • When displayed in FOLIO
    • Then:
      • Result count should NOT be preceded by "about"
      • Result count should NOT be rounded
      • For example:
        • Exact result count from Postgres = 845
        • Displays in FOLIO as "845 records found"

Thoughts Charlotte Whitt, Marc Johnson, Julian Ladisch?

Comment by Marc Johnson [ 28/Jul/20 ]

Cate Boerema

In general I think this is ok.

I believe some of the outstanding issues around this area are for folks expectations around specific searches. How does this fit in with expectations around those?

For example, when searching for suppressed by discovery yes or no? If we assume that the estimated matching records before applying those filters is less that 10 000, then the margin for cumulative error on the matching record is nearly 1000 (998 I think), in the sense that we might get a count from the two searches using the suppress from discovery filter that have a sum of maybe 1000 different from the total given without the filter.

Julian Ladisch please do correct my logic if I've misunderstood or over simplified this example (I'm mostly ignoring the error in the underlying estimates).

Comment by Cate Boerema (Inactive) [ 28/Jul/20 ]

Given the result counts are estimates, I would not expect searching for things like suppressed for discovery y/n to add up. SMEs would, of course, love for all result counts to be exact and for everything to add up precisely, but my understanding is that simply isn't possible until we have Elastic search (and, even then, it's not clear it will be as precise as people would like). Still, given what we have, I believe that appending the "about" and rounding the estimates will help users to better understand the situation. I wish the estimates could be more accurate (some are still quite a bit off, percentage-wise) but we need to work with what we have. Unless you think there is something additional we can do now to help with this?

Comment by Cate Boerema (Inactive) [ 28/Jul/20 ]

For example, if we decide that some clients e.g. discovery want to apply their own rounding rules (or not round at all), then that either means the back-end should not round at all, or whether it does or not should be controlled by the client.

Marc Johnson I wouldn't think discovery systems would rely on FOLIO for result counts at all. But Craig McNally may have some insight into whether this kind of data would be used by discovery or other integrations.

Comment by Marc Johnson [ 28/Jul/20 ]

Cate Boerema

I wouldn't think discovery systems would rely on FOLIO for result counts at all

Where would you expect discovery systems to get the information from?

As I understand it, the OAI-PMH and RTAC modules are being used by discovery systems, is that not the case?

cc: Matt Reno

Comment by Charlotte Whitt [ 28/Jul/20 ]

Hi Marc Johnson, Cate Boerema is right, discovery systems do not rely on FOLIO search result count. E.g. has Ian Hardy when setting up VuFind for the Simmons library, elegantly implemented facetted search - see:
https://libcat.simmons.edu

Comment by Charlotte Whitt [ 28/Jul/20 ]


Given the result counts are estimates, I would not expect searching for things like suppressed for discovery y/n to add up. SMEs would, of course, love for all result counts to be exact and for everything to add up precisely, but my understanding is that simply isn't possible until we have Elastic search (and, even then, it's not clear it will be as precise as people would like). Still, given what we have, I believe that appending the "about" and rounding the estimates will help users to better understand the situation. I wish the estimates could be more accurate (some are still quite a bit off, percentage-wise) but we need to work with what we have. Unless you think there is something additional we can do now to help with this?

I agree with your comments above Cate Boerema.

When would we know if Elastic Search would be a much better option, and be much more precise than what we have today?

CC: lew235 Felix Hemme

Comment by Charlotte Whitt [ 28/Jul/20 ]

Sorry - I'm working my way backwards on today's comments in this thread. But to your suggestion above Cate Boerema - https://folio-org.atlassian.net/browse/FOLIO-2648?focusedCommentId=14940&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel - then I agree.

Using result count to be rounded using the "Rounded to first digit" method - is aligned with my comment on 6/18/2020 - https://folio-org.atlassian.net/browse/FOLIO-2648?focusedCommentId=14890&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel

CC: lew235

Comment by Marc Johnson [ 28/Jul/20 ]

Cate Boerema Charlotte Whitt

Cate Boerema is right, discovery systems do not rely on FOLIO search result count.

Does that mean that we are comfortable assuming that no external systems are going to use FOLIOs record counts and so the only audience for this change is the internal, staff oriented reference UI?

Comment by Matt Reno [ 28/Jul/20 ]

Marc Johnson the current RTAC implementation in FOLIO, which is used by EDS, does not use or report record counts. The API returns RTAC info for a specified instance ID and we retrieve with limit set to Integer.MAX_VALUE instead of paging.

However, we are using counts for the EBSCO Analytics project in order to extract data from FOLIO and possibly for stats. If the counts are not accurate, then we may not be retrieving everything as expected. I guess we'll need to be aware of this when paging and maybe come up with another approach to know when we are done retrieving all pages.

Comment by Zak Burke [ 28/Jul/20 ]

Matt Reno:

If the counts are not accurate, then we may not be retrieving everything as expected. I guess we'll need to be aware of this when paging and maybe come up with another approach to know when we are done retrieving all pages.

Discussion at SUP-8; we face the same dilemma in the UI WRT how to handle paging.

Comment by Cate Boerema (Inactive) [ 28/Jul/20 ]

When would we know if Elastic Search would be a much better option, and be much more precise than what we have today?

Charlotte Whitt, my understanding is that the Shanghai folks are working on a proof of concept for Elastic search in Q3. I hope we will know more about result count precision when that is available.

FYI twliu

Comment by Cate Boerema (Inactive) [ 28/Jul/20 ]

I just restructured the description of this issue to include the scenarios. I'm thinking we can just use this as our development story now (unless this actually spins off several stories). Thoughts?

BTW,when I was reorganizing the content, I noticed that, in the last sentence of the description, Julian wrote "This issue is about deciding whether there should be a FOLIO standard for rounding, and if yes, which to choose. If the decision has been made Stripes should provide a round function to be used by all front-end modules." Another vote for doing this on the frontend, I guess.

Comment by Marc Johnson [ 28/Jul/20 ]

Cate Boerema

I'm thinking we can just use this as our development story now (unless this actually spins off several stories). Thoughts?

I think if it has been decided that we should do the rounding on the front end, then I think Zak Burke is probably best placed to outline what would be involved in achieving that.

I think it is likely that any work be blocked on both RMB-578 Blocked and it being implemented by at least one (probably mod-inventory-storage) module.

I think we should be aware that this means that any client that is not the reference UI might present different information from the same searches.

(I'd like someone like Craig McNally VBar Zak Burke Mike Gorrell to maybe comment here as to whether we think this kind of decision needs any technical governance from the Technical Council or Leads)

Comment by Julian Ladisch [ 28/Jul/20 ]

I no longer advocate for front-end rounding (and removed my sentence from the issue description that the front-end should round).
If the back-end calculates and sends the rounded estimates it has these advantages:

  • Consistency: The same query shows the same number everywhere (in internal and external software).
  • The back-end may adjust the rounding. PostgreSQL allows to set the sample size separately for each database index resulting in more accurate estimates; this may allow for rounding to first 2 digits or some other rounding scheme.

This is not a strong advantage of back-end rounding so I will also be happy with front-end rounding.

The back-end APIs are designed to be used by external software and will be used by external software.

If the back-end rounds it should also provide to original (non-rounded) estimate, for example

totalRecords: 1431
totalRecordsRounded: 1000
totalRecordsEstimated: true

No rounding is needed for an exact number:

totalRecords: 1509
totalRecordsEstimated: false

(Q3 2020 will have "totalRecordsEstimated" flag RMB-578 Blocked .)
If I have collected the history of the estimated count of some query I'd rather have the original estimated numbers (1263, 1362, 1473, 1594) than the rounded numbers (1000, 1000, 1000, 2000).
If a table is sorted by estimates the front-end may sort by the original (non-rounded) estimate even if only the rounded estimate is shown.

Comment by Marc Johnson [ 28/Jul/20 ]

Julian Ladisch

Consistency: The same query shows the same number everywhere (in internal and external software).

I agree that this is the most important aspect of the back end vs. front end technical decision.

My questions about external systems like discovery were to try to explore whether this consistency was desirable, or if it might even be problematic.

If the back-end rounds it should also provide to original (non-rounded) estimate

This is interesting, because it would allow and require a client to make the choice. If we chose this design, there would need to be front end work as well.

I'll admit I was premature in assuming the client would be ignorant of the rounding and they would only receive an already rounded estimate.

Comment by Mike Gorrell [ 28/Jul/20 ]

My take would be to do it on the backend for the reasons Julian stated (consistency internally or externally). And it feels like a core team decision to me.

Comment by Julian Ladisch [ 29/Jul/20 ]

Marc Johnson We cannot avoid front-end work. We need at least this logic:

if totalRecordsEstimated is true
  print "about " + totalRecordsRounded
else if totalRecordsEstimated is false
  print totalRecords
// totalRecordsEstimated is undefined until all back-ends have upgraded RMB,
// handle this gracefully for folio-testing:
else if totalRecords < 1000
  print totalRecords
else
  print "about " + totalRecords + " (upgrade back-end for rounding)"
Comment by Cate Boerema (Inactive) [ 29/Jul/20 ]

Thanks guys! It sounds like there will be several stories needed, so this one should probably be converted to a UXPROD feature, rather than a dev story.

Are these the stories that would define the feature? Who can tell me if it's possible to get the "totalRecordsRounded" done in Q3?

  • "totalRecordsEstimated" flag RMB-578 Blocked , Core Platform, Q3 2020
  • "totalRecordsRounded" value RMB-???, Core Platform, Q? 2020
  • Round estimated search result hit count, UIIN-???, Core Functional, Q? 2020 (dependent on previous story)
Comment by Julian Ladisch [ 29/Jul/20 ]

Yes, Q3 2020 is possible, RMB-578 Blocked (totalRecordsEstimated) and RMB-685 Blocked (totalRecordsRounded) are scheduled for this or next core platform sprint.

Comment by Marc Johnson [ 29/Jul/20 ]

Yes, Q3 2020 is possible, RMB-578 Blocked (totalRecordsEstimated) and RMB-685 Blocked (totalRecordsRounded) are scheduled for this or next core platform sprint.

Does that mean that it has been decided that the rounding will be done by the back end?

Comment by Julian Ladisch [ 29/Jul/20 ]

No, but RMB-578 Blocked and RMB-685 Blocked should be worked on at the same time and we need the RMB-685 Blocked Jira for capacity planning.

Comment by Marc Johnson [ 29/Jul/20 ]

Julian Ladisch

No, but RMB-578 Blocked and RMB-685 Blocked should be worked on at the same time and we need the RMB-685 Blocked Jira for capacity planning.

If they need to be worked on at the same time, does that mean that work on informing the client whether the total record count is an estimate or not is blocked upon a decision as to where to do the rounding?

Comment by Cate Boerema (Inactive) [ 29/Jul/20 ]

Does that mean that it has been decided that the rounding will be done by the back end?

Marc Johnson do you have an objection to doing it on the back end? If not, let's just do that since Julian Ladisch and Mike Gorrell prefer that approach. Zak Burke has said it doesn't matter to him either way.

Comment by Marc Johnson [ 29/Jul/20 ]

Cate Boerema

do you have an objection to doing it on the back end? If not, let's just do that since Julian Ladisch and Mike Gorrell prefer that approach. Zak Burke has said it doesn't matter to him either way.

I don't have any objections to this. I think it is important that we are confident about the need for consistency. If we choose to make this opaque to clients by using the same property (totalRecords) then clients have no choice and so it will be consistent. If we choose to have separate properties for rounded and non-rounded values then the client may choose and so could be inconsistent.

Does that make sense?

Comment by Julian Ladisch [ 29/Jul/20 ]

This cannot be opaque to the client because the client needs to check the totalRecordsEstimated property to decide whether to add "about " or the translation of "about " to the number.

Comment by Marc Johnson [ 29/Jul/20 ]

Julian Ladisch

This cannot be opaque to the client because the client needs to check the totalRecordsEstimated property to decide whether to add "about " or the translation of "about " to the number.

Apologies for being vague with my scope, I was referring only to the rounding aspect of this, not whether the record count is estimated or not (which I believe is already being worked on and I consider somewhat separate)

Comment by Marc Johnson [ 10/Aug/20 ]

Has a decision been made about where FOLIO is going to round the record counts and what impact that has on clients?

Comment by Marc Johnson [ 11/Aug/20 ]

Consistency: The same query shows the same number everywhere (in internal and external software).

It has been expressed by folks that an important aspect of doing this work on the back end is consistency. I think we need to decide what we mean by consistency.

The proposed API design for this rounding distinguishes between rounded and not rounded totals (by including two properties totalRecords and totalRecordsRounded).

This means that it is transparent to the client and which value is used it at the discretion of the individual client. Some clients may choose to present the rounded number, others the unrounded and others may choose to apply their own rounding to the unrounded number.

I had thought of consistency to mean that we expected the same value to be presented for the same question irrespective of the client. It is entirely possible I misunderstood the use in this context.

What degree of consistency are folks wanting to achieve?

Comment by Julian Ladisch [ 11/Aug/20 ]

If the clients should do the rounding we need to duplicate the rounding code for each client. Some clients may implement a different rounding algorithm (see A, B, and C in this issue desciption), or may incorrectly implement the correct rounding algorithm; this results in different numbers for the same query = inconsisteny.

Some clients may implement a minimal viable product without any rounding to avoid the rounding code. If the rounded value is provided by the back-end it is more likely that the developer uses it.

Improving the rounding algorithm requires to update all clients if rounding is client-side, but does not require any client change if the rounded value is provided by the back-end.

Comment by Marc Johnson [ 12/Aug/20 ]

Julian Ladisch

If the rounded value is provided by the back-end it is more likely that the developer uses it

That way well be the case.

Improving the rounding algorithm requires to update all clients if rounding is client-side, but does not require any client change if the rounded value is provided by the back-end.

An inherent part of the proposed design is that the clients can choose between the existing totalRecords and the new totalRecordsRounded that means that there is work for any existing client to support rounding. It also means that clients can choose to implement their own rounding algorithm at any point in the future.

Comment by Cate Boerema (Inactive) [ 17/Aug/20 ]

Hi guys. Based on Julian Ladisch's comments, I converted this to a UXPROD feature and assigned it to Core Platform for Q3 2020. I linked 3 RMB issues as defining this feature (this feature won't be done until all 3 of those issues are complete).

Are there additional stories needed? Do we need something for the front end to make use of this new data?

Is Core Platform still tracking to have this done for Q3?

Thanks much!

FYI Jakub Skoczen and Charlotte Whitt

Comment by Marc Johnson [ 17/Aug/20 ]

Cate Boerema

Are there additional stories needed? Do we need something for the front end to make use of this new data?

Assuming we are going with the approach where the back-end introduces a new property (totalRecordsRounded) then yes, we will need UI issues for using this property (at their discretion) rather than totalRecords.

As the rollout of this will likely be incremental across the various modules and API endpoints, I think we likely need input from Zak Burke or Michal Kuklis to advise on whether this can be done in a centralised way in stripes (based upon the presence of properties) or whether it needs to be done for each record type in turn. It's worth keeping in mind that some endpoints may never provide totalRecordsRounded when we think about how to do this.

Depending upon how the change is done in RAML Module Builder, then we might also need issues in the various back end modules to expose this in the APIs.

Comment by Zak Burke [ 17/Aug/20 ]

Cate Boerema, Marc Johnson is correct that we will need stories across most front-end app modules, as well as stripes-components and stripes-smart-components.

There are ~70 occurrences that will need to be investigated in total-core.txt and an additional ~40 in total-complete-only.txt .

Comment by Charlotte Whitt [ 17/Aug/20 ]

As I understand it, then at the PC meeting last week (https://docs.google.com/document/d/1MpDJGX5wPzJZtqf4bSOge1D9clLLhKIqhIqqTDD87gA/edit) Lucy Liu told that the Shanghai Library have come far with their investigation of Elastic Search, and they expect to be able to present the result in Q3 2020 -

See more in: https://folio-org.atlassian.net/browse/UXPROD-2592

Comment by Cate Boerema (Inactive) [ 17/Aug/20 ]

As I understand it, then at the PC meeting last week (https://docs.google.com/document/d/1MpDJGX5wPzJZtqf4bSOge1D9clLLhKIqhIqqTDD87gA/edit) Lucy Liu told that the Shanghai Library have come far with their investigation of Elastic Search, and they expect to be able to present the result in Q3 2020 -

See more in: https://folio-org.atlassian.net/browse/UXPROD-2592

Charlotte Whitt, I still think it's prudent for us to progress on this, as ES will take a while. Shanghai is starting by just implementing infrastructure in Q3. It will not be wired into actual FOLIO apps until Q4 at the soonest.

we will need stories across most front-end app modules, as well as stripes-components and stripes-smart-components. There are ~70 occurrences that will need to be investigated in total-core.txt and an additional ~40 in total-complete-only.txt .

Zak Burke, that's a lot. I hope we can take a phased approach to this development, starting with Inventory. What is the best way to plan this work? Should we discuss in a Core Functional grooming meeting or schedule a discussion with a smaller group?

Comment by Zak Burke [ 17/Aug/20 ]

Cate Boerema, I think we could start by grooming stories in stripes-connect and stripes-smart-components. That will address the search-and-sort component family, and thus take care of a big slice of the most important pieces of the UI across many apps.

Looking at those lists in more detail, there are many "if totalRecords > 0" like statements, which mean there is no work to do there. So, my estimate may have been a vast-over-estimate, which would be nice. In any case, yes, let's do the remainder one app at a time and start with inventory.

Comment by Adam Dickmeiss [ 19/Aug/20 ]

I know what totalRecordsRounded is supposed to do.. Give some comfort! But really.. This is something the UI should do.. Not to mention that some rounded value gives some false sense of upper and lower bound.. Eg that something is within < 100 of of real hit or < 1000 within some hit count.. Eg if the value returned is 30000 it could be 47000 or if the value is 1700 it could be 17000.

Comment by Julian Ladisch [ 19/Aug/20 ]

Note that in the current proposal the back-end passes both the original estimation and the rounded estimation to the clients. The clients can decide whether to show the original estimation or the rounded estimation (they may even do some own rounding based on the original estimation).

If the exact value is 47000 and the estimated value is 25088 or 34088 then showing the rounded value of 30000 is better then showing 25088 or 34088 because it doesn't give the false sense of precision.
Can you explain in more detail why we should not round?

Comment by Adam Dickmeiss [ 19/Aug/20 ]

I don't it's backend's business to return a rounded number when it is not based on lower and upper bound.

If a UI wants to show "thousands of hits" for an estimated hit count > 1000 or whether to show 34000 is up to the UI team. We will continue to get bug reports about this even with a rounded number because it will give the impression that there's some reality to the value.. There's not at the moment.

Comment by Julian Ladisch [ 20/Aug/20 ]

Assume these number returned from the back-end:

totalRecords: 32351
totalRecordsRounded: 30000
totalRecordsEstimated: true

This issue proposes to display it FOLIO as "about 30,000 records found".
What is your suggestion?

Comment by Adam Dickmeiss [ 20/Aug/20 ]

I don't think the backend should return a totalRecordsRounded entity. I think that with the totalRecords and totalRecordsEstimated, the UI can do whatever it pleases.

Comment by Julian Ladisch [ 20/Aug/20 ]

Adam Dickmeiss What should the UI display for this value?

totalRecords: 32351
totalRecordsEstimated: true
Comment by Adam Dickmeiss [ 20/Aug/20 ]

The UI could show 30000. Another UI could leave it out.. Clients that are not UIs won't care. Things like this should not be coupled between modules. I don't care so much what the UI does. What I think is wrong is to leave it to the backend to do UI work.

Comment by Julian Ladisch [ 20/Aug/20 ]

If we change the sample size of a PostgreSQL table the accuracy of the estimation changes. The back-end can then switch to a more appropriate rounding type (A, B or C as presented in this issue's description, or some other). Do you have proposal how to indicate to the UI that the accuracy has changed?

Comment by Adam Dickmeiss [ 20/Aug/20 ]

What if we see that we can actually have a lower and upper bound later, what then? Bottomline is that at THIS stage the estimated number is the closest to reality that the backed can provide.

Comment by Adam Dickmeiss [ 20/Aug/20 ]

https://lucene.apache.org/solr/guide/8_6/common-query-parameters.html
for what some well-known search engine does.

Comment by Julian Ladisch [ 31/Aug/20 ]

Thank you for the link.
Comparing

shows that our current FOLIO proposal works the same as Solr:

RAML Module Builder Apache Solr content
totalRecords numFound hit count, exact or estimated
totalRecordsEstimated numFoundExact indicates whether the totalRecords/numFound is exact or estimated
exactCount minExactCount for hit counts less than this value calculate the exact hit count
Comment by Debra Howell [ 21/Sep/20 ]

If elastic search is implemented, this will not be as important. UXPROD-2592 Closed

Comment by Julian Ladisch [ 02/Oct/20 ]

Cate Boerema This is the algorithm for the front-end:

if totalRecordsEstimated is true
  print "about " + totalRecordsRounded
else if totalRecordsEstimated is false
  print totalRecords
// totalRecordsEstimated is undefined until all back-ends have upgraded RMB,
// handle this gracefully for folio-testing:
else if totalRecords < 1000
  print totalRecords
else
  print "about " + totalRecords + " (upgrade back-end for rounding)"

Can you update the issue description of UXPROD-2623 Closed , UXPROD-2695 Closed and UXPROD-2702 Closed accordingly?

Comment by Cate Boerema (Inactive) [ 05/Oct/20 ]

Yes, I will. Thanks Julian Ladisch

Comment by Cate Boerema (Inactive) [ 05/Nov/20 ]

Jakub Skoczen I removed the R1 2021 fix version from this issue per our discussion in the platform roadmap meetings. We haven't made it that far with this and it's best at this point to hold off and wait for Elastic search.

Comment by Charlotte Whitt [ 26/Feb/21 ]

Hi Jakub Skoczen - The libraries are asked to do another round of feature ranking using 100 points. While this feature is in an undecided state, due to awaiting for the outcome of the POC of using Elastic Search in Inventory. I have therefore changed this feature to Draft, which I hope is okay by you? 

Comment by Charlotte Whitt [ 11/Aug/21 ]

Hi Jakub Skoczen - With the decision to move ahead and implement Elastic search ( UXPROD-3046 Closed ), then I recommend that we close this feature as will not do. Do you agree?

Comment by Julian Ladisch [ 11/Aug/21 ]

I don't agree because there are quite a few back-end processes that need totalRecords and cannot use Elastic Search because of the delay changes are propagated from PostgreSQL to Elastic Search.

Generated at Thu Feb 08 23:22:13 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.