SPIKE Design batch create / update API endpoint standard

Description

Context

There have been frequent requests for batch APIs for various types of records in FOLIO

Some modules have started to implement batching, either in response to this, or performance concerns.

These implementations all differ, I think it could be valuable to try to decide on a pattern for these.

Framing Questions

Size expectations or restrictions

has suggested that we may want to use this to load typical batch sizes of 100 .. 2000 records.

Synchronous or asynchronous response

Should the server not respond until the batch processing has finished, or should it respond promptly (maybe after some validation) with the ability to monitor the status of the operation?

How does this affect the client?

Is this decision affected by the size of batch we allow, as that is likely a primary component of latency?

Should a response include complete representation of all of the records created, or references or even no information at all (except failures, depending upon the question below)?

Complete or partial success / failure

Should a batch only succeed if all records and valid, or should it be acceptable for some records to be invalid?

What should happen if all of the records are valid however persistence of some of them fails (this is likely related to the transactions topic below)?

Database transactions (specific to storage modules)

Should the records that are created be done so in a single transaction?

How could this decision affect the handling partial success or failure, if we decide we also want this?

How does this affect resource usage, e.g. a connection has to used exclusively for each batch operation, which could lead to connection contention within the module?

Streamed processing of records

To constrain memory usage during batch operations, should the set of records be processed as a stream of single (or multiple records)?

How does this affect validation, any restrictions on batch size or database transaction semantics?

For example, if we wanted to validate all records prior to any persistence, we might need to be able to process the stream more than once.

Processing Semantics

  • Optional ID

  • JSON schema validation

What requirements are we missing?

Environment

None

Potential Workaround

None

Attachments

1

Checklist

hide

TestRail: Results

Activity

Show:

Julian Ladisch January 8, 2020 at 2:52 PM

: For the status of GET batch/streaming see the "Issue Links" at .

Julian Ladisch January 7, 2020 at 1:24 PM

The dedicated array property

makes it easy to add additional properties and most clients will continue to work:

Julian Ladisch January 7, 2020 at 1:23 PM

The property name "holdingsRecords" is consistent with the property name of the GET /holdings-storage/holdings API: https://s3.amazonaws.com/foliodocs/api/mod-inventory-storage/p/holdings-storage.html#holdings_storage_holdings_get_response

Jon Miller January 5, 2020 at 7:49 PM

Can you please remove the top-level property and just make it an array? The current implementation makes extra work with regard to generating client code for these methods. For example, in the case of holdings, the URL is "/holdings-storage/batch/synchronous", while the property name is "holdingsRecords" rather than just "holdings".

Jon Miller December 27, 2019 at 7:12 PM

I just noticed that there is a top-level property in the request JSON. I'm wondering if that serves a purpose? Why not just have an array with no top-level property?

Won't Do

Details

Assignee

Reporter

Priority

Development Team

Core: Platform

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs

Created May 30, 2019 at 3:34 PM
Updated January 25, 2023 at 2:39 PM
Resolved January 25, 2023 at 2:39 PM
TestRail: Cases
TestRail: Runs