Epic to link all support issues located in Dev projects (SUP-12)

[RMB-684] GET query returns no records when offset value > estimated totalRecords Created: 29/May/20  Updated: 18/Jan/21  Resolved: 19/Aug/20

Status: Closed
Project: RAML Module Builder
Components: None
Affects versions: 30.2.4
Fix versions: 30.2.5, 31.0.1, 31.1.0
Parent: Epic to link all support issues located in Dev projects

Type: Bug Priority: P2
Reporter: Theodor Tolstoy (One-Group.se) Assignee: Julian Ladisch
Resolution: Done Votes: 0
Labels: Support, platform-backlog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: PNG File screenshot-1.png     PNG File screenshot-2.png    
Issue links:
Blocks
blocks CIRCSTORE-226 Missing records when offset >= totalR... Closed
blocks MODFEE-90 /feefineactions?query=cql.allrecords=... Closed
blocks MODPERMS-91 /perms/users web API possibly relies ... Closed
blocks CHAL-294 Query request where offset value > to... In Code Review
Relates
relates to UIORGS-196 Active status filter returns inaccura... Closed
relates to UIORGS-197 End of list displays for search resul... Closed
Sprint: CP: sprint 94, CP: sprint 95
Story Points: 2
Development Team: Core: Platform
Release: Q2 2020 Hot Fix #1
Affected Institution:
Chalmers
Epic Link: Epic to link all support issues located in Dev projects

 Description   

Overview: Given a query that matches >= 1000 rows, it is not possible to use limit and offset together to retrieve rows after the 999th because when limit is greater than 999, the result array is always empty. This makes it impossible to retrieve paged data beyond the 999th record.

Steps to reproduce:

  1. One way or another, find out the actual number of loans with Status "Open". In our case, it is 6023.
  2. Make this request to find out the estimated totalRecords count:
    {{okapi-url}}/loan-storage/loans?limit=0&query=(status.name="Open")

    Make a note that totalRecords: 1000.

  3. Attempt to retrieve these 6023 loans by making the 7 times, incrementing offset by 1000 for each iteration:
    {{okapi-url}}/loan-storage/loans?limit=1000&offset=0&query=(status.name="Open")

Expected result:
The first 6 requests return 1000 loans each. The 7th request returns 23 loan objects.

Actual result:
The first request returns 1000. The subsequent requests return [] loans.

Additional information:
The same behaviour can be observed in the bugfest tenant as well.

If the offset parameter > than totalRecords, no records are returned. This can be demonstrated by the following steps:

  1. Make the following request:
    {{okapi-url}}/loan-storage/loans?limit=1000&offset=1000&query=(status.name="Open").

This should return loans 1000 loans, but actually returns [] – ie no loans past totalRecords.

  1. Now change the offset to be < totalRecords:
    {{okapi-url}}/loan-storage/loans?limit=200&offset=900&query=(status.name="Open")

After the last request, we might expect this to return 100 loans, stopping at the totalRecords. However, this actually returns 200 loans – ie the correct result, and 100 loans past the totalRecords.

This issue has also been observed in Inventory, see https://folio-org.atlassian.net/browse/CHAL-294



 Comments   
Comment by Kelly Drake [ 29/May/20 ]

Theodor Tolstoy (One-Group.se) I am able to duplicate your scenario in the Simmons environment as well.

Only one question - how do you know how many actual open loans there are? ie - where did you get the 6023 figure?

Comment by Anya [ 29/May/20 ]

What dev or po do we need to look at this ?

Comment by Kelly Drake [ 29/May/20 ]

Emma? It has to do with the loan-storage is part of mod-circulation?

Comment by Anya [ 29/May/20 ]

Emma Boettcher Cate Boeremacould you look at this bug?

Comment by Theodor Tolstoy (One-Group.se) [ 29/May/20 ]

Kelly Drake I will add the dB query to the ticket

Comment by Cate Boerema (Inactive) [ 02/Jun/20 ]

Anya I can have Core Functional look at this. I am not familiar with the SUP issue process. Should I assign this issue to Core Functional or am I supposed to create a clone of it?

Could you please tell us more about the priority of this issue? Is it must-fix for Q2 or can it wait until Q3?

Comment by Lisa Sjögren [ 11/Jun/20 ]

Cate Boerema I'm the person who originally reported this issue, as well as CHAL-294 In Code Review , which is the same issue but in Inventory.

Regarding the priority for Chalmers: so far, we've found more or less cumbersome ways to work around it – some of them thanks to the fact that we have relatively small amounts of records.

But I suspect that we will keep running into this, and that it might hit libraries with larger volumes of records harder. Even more concerning, I imagine it would also affect any internal or external services that are using CQL queries to fetch subsets of data from e.g. item-storage or loan-storage.

So to me, the important thing seems to be that this is investigated and solved thoroughly, across any affected apps.

What do you think, Theodor Tolstoy (One-Group.se)?

Comment by Theodor Tolstoy (One-Group.se) [ 15/Jun/20 ]

I agree with Lisa Sjögren.

1. It makes migrating things a nightmare because you cannot trust the APIs to verify that things got set up the right way. The only way is to either use DB queries or harvest out all objects of a certain kind, and then do filter and counting once they are downloaded. Both these processes are making it hard for clients to validate if the migration was successful or not. Luckilly, you can have very high limit-numbers in the request parameters, but that is not a very polished way to do it.
2. If these counts, just as Lisa says, are used and relied upon in service-to-service requests, it would make the system behave very unexpected, to say the least.

To me, this issue is critical, especially if it is wide-spread and if it is relied upon in communication between modules.

Comment by Theodor Tolstoy (One-Group.se) [ 15/Jun/20 ]

Kelly Drake if you make a request for open loans in postman, and you set the limit parameter very high, you can then search for properties in the response and note the number of occurrences. You will see that it is higher than the totalRecords.

Comment by Kelly Drake [ 15/Jun/20 ]

Thanks Theodor Tolstoy (One-Group.se) - also for your explanation of why this is so critical!

Comment by Theodor Tolstoy (One-Group.se) [ 16/Jun/20 ]

Zak Burke made me aware, that replacing the single = with a double equals sign (==) made the queries work as expected.

Lisa Sjögren can you verify that on your part as well?

I am not sure if this solves the whole issue or not. Does it?

Comment by Lisa Sjögren [ 16/Jun/20 ]

Verified that it's true for the case descried above. I am also getting what looks like an exact count (5 924) in totalRecords - which I had come to expect was impossible...

But could this be different for different modules? When I test with requests I get this:

And it looks like the request app itself is using single equals sign for the same query.

Is there documentation anywhere detailing the differences between record counts and limit/offset use across FOLIO or in different apps?

Comment by Lisa Sjögren [ 16/Jun/20 ]

When I originally reported this issue in CHAL-294 In Code Review , regarding Inventory, all my example queries were using double equals signs.

{{okapi-url}}/item-storage/items?query=(effectiveLocationId=="f4c5afee-261a-43d1-9708-cc123aaba50a")&limit=100&offset=11100

So it seems that back in April double equals signs didn't make any difference, at least in item-storage. But would be great to know if there has been development that fixes both this and CHAL-294 In Code Review .

Comment by Theodor Tolstoy (One-Group.se) [ 16/Jun/20 ]

Completely agree Lisa Sjögren
I also think there is more to it than just using double equal signs. Is there documentation somewhere telling you what that means compared to the single equal signs? Does it differ between modules? Does the behavior change between versions?

Comment by Lisa Sjögren [ 16/Jun/20 ]

The only documentation on double equals signs I am aware of is https://dev.folio.org/faqs/explain-cql. I am not able to infer the different behaviours around result counts and returned results when using single/double equals signs from the information there.

Comment by Cate Boerema (Inactive) [ 26/Jun/20 ]

Emma Boettcher I guess these SUP issues are supposed to be cloned to create dev ones. Will you do that for this one and assign to CF? It looks like it is in your area. Thanks!

Comment by Anya [ 13/Jul/20 ]

Emma Boettcher could you not clone - but follow the procedure that was laid out in the last PO meeting.

Comment by Anya [ 20/Jul/20 ]

Emma Boettcher - on behalf of the support sig - could a release field be added to this ticket.

Comment by Kelly Drake [ 27/Jul/20 ]

Zak Burke will follow up with RMB.

Comment by Julian Ladisch [ 27/Jul/20 ]

For better performance the full text word match status.name="Open" should be changed to the field match status.name=="Open". However, this will only improve performance (in connection with the new index CIRCSTORE-223 Closed ). This cannot influence this limit/offset bug because we only have Open and Closed as loan status names. For details about = and == see https://dev.folio.org/faqs/explain-cql/ and https://github.com/folio-org/raml-module-builder#cql-relations .

Comment by Zak Burke [ 27/Jul/20 ]

I learned from Julian Ladisch that it is possible to get exact an exact count from /loan-storage/loans by changing the CQL query from status.name="Open" (single-equals) to status.name=="Open" (double-equals). I haven't yet confirmed whether this is a general property of CQL queries that we can always use to get exact counts; stay tuned.

Comment by Lisa Sjögren [ 28/Jul/20 ]

Zak Burke and Julian Ladisch When I tested some queries with single and double equals back in June, the behaviour did not seem consistent between modules.

See this comment and this comment.

Comment by Julian Ladisch [ 28/Jul/20 ]

How to get an exact count is out of scope of this issue. See https://github.com/folio-org/raml-module-builder#estimated-totalrecords for details. Please create a new Jira if you want to report a bug or use #raml-module-builder Slack channel if you want to discuss this topic.

Comment by Zak Burke [ 28/Jul/20 ]

The challenge is that a query like

GET https://okapi-bugfest-goldenrod.folio.ebsco.com/loan-storage/loans?limit=0&query=(status.name="Open")

that uses limit=0 to force an exact totalRecords value returns "totalRecords": 6676 but then the exact same query clause with an offset >= 1000 returns an empty array:

GET https://okapi-bugfest-goldenrod.folio.ebsco.com/loan-storage/loans?limit=1&offset=1000&query=(status.name="Open")

{
  "loans": [],
  "totalRecords": 1000
}

It turns out that you can gain access to the > 1000 rows by keeping offset under 1000 and increasing the limit value, e.g.

https://okapi-bugfest-goldenrod.folio.ebsco.com/loan-storage/loans?limit=${6676 - 999}&offset=999&query=(status.name="Open")

but this makes paging through large result sets, a fundamental optimization deeply woven into the UI, literally impossible.

Comment by Jon Miller [ 09/Sep/20 ]

There is another issue with regard to paging through data to watch out for. If you are trying to page through millions of rows of data to do something like dump all the data in a table, queries that have high offset values will be slower than ones with lower ones. For example, say you fetch 5,000 with an offset of 8 million. I could take a minute and a half to complete, while it might only take 15 seconds if the offset was 0.

This is a good reason that any query API that queries a table that has more than a small number of rows should stream the data and not require paging. That is, in addition to the fact that the query happens in a single transaction as a snapshot. Otherwise, rows could be inserted/removed while you are paging through the data, causing rows to be read more than once or skipped.

It might not hurt if there was a way for RMB to add a default ORDER BY clause to the SQL if offset or limit are provided. You have to use an ORDER BY if you are paging through the data. Otherwise, you will get inconsistent results and will end up reading the same results multiple times.

Overall, having to page the data causes a lot of hassle and a lot of bugs. Everything should stream.

Comment by Jon Miller [ 09/Sep/20 ]

Also, it doesn't help that a lot of the Docker modules have the max memory size set to small values like 300 MB. If you use too large of a page size, you will run out of memory. When that happens, Docker just goes ahead and terminates the module process. I've seen it where OKAPI returns an empty response when this happens and HTTP success status. The only way you can check if it failed is to check for empty string. Docker killing processes like that is pretty ridiculous if you ask me. The default memory sizes on a lot of modules are way too small.

Comment by Marc Johnson [ 10/Sep/20 ]

Jon Miller

This is a good reason that any query API that queries a table that has more than a small number of rows should stream the data and not require paging

By streaming are you referring to an API that only allows for fetching the entire record set (as no offset or limit can be provided)?

A trade off with that approach is that clients must read all of the records even if they are only interested in the first or the last.

Comment by Jon Miller [ 10/Sep/20 ]

Marc Johnson No, I mean paging is optional, not required. For a lot of the APIs you have to use paging (otherwise the server will run out of RAM), because it buffers the results on the server-side. I.e. I'm pretty sure it loads them into an array before returning them through the HTTP response. With streaming, the results are returned one at a time over the HTTP response and not loaded into an array first. This is already implemented for instances, holdings, items, and users.

Generated at Thu Feb 08 22:20:58 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.