SPIKE Design batch create / update API endpoint standard
Description
Environment
Potential Workaround
Attachments
blocks
relates to
Checklist
hideTestRail: Results
Activity

Julian Ladisch January 8, 2020 at 2:52 PM
: For the status of GET batch/streaming see the "Issue Links" at .

Julian Ladisch January 7, 2020 at 1:24 PM
The dedicated array property
makes it easy to add additional properties and most clients will continue to work:

Julian Ladisch January 7, 2020 at 1:23 PM
The property name "holdingsRecords" is consistent with the property name of the GET /holdings-storage/holdings API: https://s3.amazonaws.com/foliodocs/api/mod-inventory-storage/p/holdings-storage.html#holdings_storage_holdings_get_response

Jon Miller January 5, 2020 at 7:49 PM
Can you please remove the top-level property and just make it an array? The current implementation makes extra work with regard to generating client code for these methods. For example, in the case of holdings, the URL is "/holdings-storage/batch/synchronous", while the property name is "holdingsRecords" rather than just "holdings".

Jon Miller December 27, 2019 at 7:12 PM
I just noticed that there is a top-level property in the request JSON. I'm wondering if that serves a purpose? Why not just have an array with no top-level property?
Details
Assignee
Ian WallsIan WallsReporter
Marc JohnsonMarc JohnsonPriority
P2Development Team
Core: PlatformTestRail: Cases
Open TestRail: CasesTestRail: Runs
Open TestRail: Runs
Details
Details
Assignee

Reporter

Context
There have been frequent requests for batch APIs for various types of records in FOLIO
Some modules have started to implement batching, either in response to this, or performance concerns.
These implementations all differ, I think it could be valuable to try to decide on a pattern for these.
Framing Questions
Size expectations or restrictions
has suggested that we may want to use this to load typical batch sizes of 100 .. 2000 records.
Synchronous or asynchronous response
Should the server not respond until the batch processing has finished, or should it respond promptly (maybe after some validation) with the ability to monitor the status of the operation?
How does this affect the client?
Is this decision affected by the size of batch we allow, as that is likely a primary component of latency?
Should a response include complete representation of all of the records created, or references or even no information at all (except failures, depending upon the question below)?
Complete or partial success / failure
Should a batch only succeed if all records and valid, or should it be acceptable for some records to be invalid?
What should happen if all of the records are valid however persistence of some of them fails (this is likely related to the transactions topic below)?
Database transactions (specific to storage modules)
Should the records that are created be done so in a single transaction?
How could this decision affect the handling partial success or failure, if we decide we also want this?
How does this affect resource usage, e.g. a connection has to used exclusively for each batch operation, which could lead to connection contention within the module?
Streamed processing of records
To constrain memory usage during batch operations, should the set of records be processed as a stream of single (or multiple records)?
How does this affect validation, any restrictions on batch size or database transaction semantics?
For example, if we wanted to validate all records prior to any persistence, we might need to be able to process the stream more than once.
Processing Semantics
Optional ID
JSON schema validation
What requirements are we missing?