[FOLIO-1526] Enable requesting all records to avoid pagination anomalies Created: 24/Sep/18  Updated: 05/Mar/19  Resolved: 04/Mar/19

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: New Feature Priority: P3
Reporter: Nassib Nassar Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: core, sprint47
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Blocks
blocks UXPROD-1128 Library Data Platform (LDP) Beta Closed
is blocked by RMB-255 Add streaming support to RMB Closed
Relates
relates to CQLPG-88 Implement id > [uuid] for pagination Closed
Sprint:
Development Team: Core: Platform

 Description   

When a request is made to a module to retrieve data, the results are paginated with a maximum allowable pagination limit of 2147483647 records. For https://folio-org.atlassian.net/browse/UXPROD-1128 it is necessary to retrieve all records accurately, and the pagination scheme may result in anomalies due to state changes in between requests (or possibly other reasons). The modules do not document a limit on the total number of records they can store, and PostgreSQL imposes no such limit since it supports unlimited rows in a table. The pagination maximum of 2147483647 is a high limit but it is possible (in theory) that one or more (specialized) modules could reach that number of rows eventually. Having any pagination maximum means a client that needs all records will have to test and maintain a rare edge case and address the pagination anomaly in some way (though it is unclear how). A solution for this problem would be a convention that clients can specify -1 as the pagination limit to indicate that all records are being requested in a single page. The module implementation should handle the request in a single database transaction/query to ensure there are no anomalies due to multiple queries.



 Comments   
Comment by Julian Ladisch [ 04/Mar/19 ]

2147483647 = Integer.MAX_VALUE in Java.
Both the limit and the offset parameter are limited to that number: https://github.com/folio-org/raml/blob/raml1.0/traits/pageable.raml
If there are more records than this we have to change the API and use long instead of int in the implementation.
For most tables it is a bug if there are that many records (we have 10 GB if the average record size is only 5 bytes).
Until there really is a need for such a big dataset it is a theoretical issue and we should postpone it.

Comment by Nassib Nassar [ 04/Mar/19 ]

In that case it might be a good idea for the modules to specify the limit on the total number of records they can store, if exceeding it is a bug. I will close this issue.

Comment by Jon Miller [ 04/Mar/19 ]

If the API is streaming the results back, why not just make the limit parameter optional? That way the behavior is similar to the other parameters. For example, offset isn't required. Also, the documentation doesn't seem to consistently specify which parameters are required. I've also noticed that at least at one point some APIs named the parameter "limit" and others named it "size" (if I remember correctly). Maybe it was only one of them that was named differently. I don't remember which one it was at the moment.

Comment by Julian Ladisch [ 05/Mar/19 ]

Regarding "limit" and "size":
We have the pageable trait: https://github.com/folio-org/raml/blob/raml1.0/traits/pageable.raml
These are parameter names and default values of the trait: offset=0, limit=10
RMB documentation suggests to use the collection resource type that is based on that trait: https://github.com/folio-org/raml-module-builder#step-6-design-the-raml-files

Generated at Thu Feb 08 23:13:59 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.