Spike: Support large-scale bulk renewal overrides

Tickets:

CIRC-573 - Getting issue details... STATUS
UICIRC-736 - Getting issue details... STATUS

Batch API

Existing API endpoints to provide batch API alternative of:

POST /circulation/renew-by-barcode
POST /circulation/renew-by-id

These would respectively become:

POST /circulation/renew-by-barcode-batch
POST /circulation/renew-by-id-batch

These endpoints would follow a similar design structure as to those described here:

FOLIO-2050 - Getting issue details... STATUS
https://issues.folio.org/secure/attachment/20079/Possible%20batch%20api%20workflow.png

References:

MODUIMP-57 - Getting issue details... STATUS
MODINVSTOR-478 - Getting issue details... STATUS
MODINVSTOR-353 - Getting issue details... STATUS
RMB-505 - Getting issue details... STATUS
FOLIO-1156 - Getting issue details... STATUS
https://s3.amazonaws.com/foliodocs/api/mod-inventory-storage/instance-storage-batch.html
https://s3.amazonaws.com/foliodocs/api/mod-inventory-storage/instance-sync.html
https://www.citusdata.com/blog/2017/11/08/faster-bulk-loading-in-postgresql-with-copy/
https://stackoverflow.com/questions/46715354/how-does-copy-work-and-why-is-it-so-much-faster-than-insert
How to design batch API (General recommendations)

The Design

The existing renewal process is too slow. However, having a batch perform one operation at a time has also proven problematic. The suggested design is to have a batch queue up a defined number of renewals and then perform the operation.
The business logic can perform a pre-process where it checks to see if any of the renewals would fail for business logic reasons. Queue all renewals that pass the business logic test and then when the queue size is reached perform the SQL operation on that queue. This is potentially problematic in that if something fails in the queue the entire queue might be considered a failure. If this is undesired then a queue size of 1 must be used.

To restrict the amount of memory consumption there should be a max batch size of 10,000. When more than the maximum batch size is sent to the API endpoints then an HTTP 413 "Payload Too Large" should be immediately returned without processing anything.

The PO wishes for the entire batch to process and that all success and failures be returned.

The resulting structure might look as follows:

{
  "success": [
    {
      "id": "ef642903-c1c3-45da-95d9-3e0e3db617ca",
      ...
    },
    {
      "id": "a0dcc225-12ed-4db9-badf-1b579bccacc7",
     ...
    },
    {
      "id": "b8d0d431-b535-4bc4-a1dd-5bee2ab2d49c",
     ...
    }
  ],
  "failure": [
    {
      "id": "f087206e-1622-4fa4-a317-9a53ec5b49c3",
      ...
    },
    {
      "id": "4a63f8af-3e9c-45f3-88d4-6a5515d62b18",
     ...
    }
  ],
  "error": [
    {
      "id": "f087206e-1622-4fa4-a317-9a53ec5b49c3",
      ...
    }
  ],
  "totalSuccess": 3,
  "totalFailure": 2,
  "totalError": 1
}

The "totalSuccess", "TotalFailure", and "totalError" is not strictly needed and may not need to be implemented.

Example Multi-insert SQL

Postgresql can insert multiple rows in a single statement, such as:

Multi-Insert SQL

UPDATE table AS t SET
  field1 = v.field1,
  field2 = v.field2
FROM (values
  (1, 'field1 row1', 'field2 row1'),
  (2, 'field1 row2', 'field2 row2')
) AS v(id, field1, field2)
WHERE v.id = u.id;

Ticket FOLIO-1156 describes active work to in this regard that might provide a way to do the above or something similar.