Spike: Investigate limiting document size on upload
Requirements: - MODINVOICE-125Getting issue details... STATUS
A solution for enforcement a size limit for invoice documents should be developed.
HTTP/REST API File Uploads
- Base64 encode the file and add processing overhead in both the server and the client for encoding/decoding.
- Send the file and metadata both in a multipart/form-data POST.
- Send the file first in a
multipart/form-data
POST, and return an ID to the client. The client then sends the metadata with the ID, and the server re-associates the file and the metadata. - Send the metadata first, and return an ID to the client. The client then sends the file with the ID, and the server re-associates the file and the metadata.
№ | Method | Advantages | Disadvantages |
---|---|---|---|
1 | Base64 encode the file and add processing in both the server and the client for encoding/decoding |
|
|
2a | Send the file and metadata both in a multipart/form-data POST |
|
|
2b | Send the file and metadata both in a application/octet-stream POST |
|
|
3 | Send the file first in a |
|
|
4 | Send the metadata first, and return an ID to the client. The client then sends the file with the ID, and the server re-associates the file and the metadata |
Limiting document size
The most interesting are the following options for particular FOLIO case.
Base64 encode the file (current implementation)
In this approach, it is enough to introduce in the schema a limit on the Base64-string size taking into account the fact that the overhead for the file size is about 30%.
{ "$schema": "http://json-schema.org/draft-04/schema#", "description": "Object with base64 encoded file data", "type": "object", "properties": { "data": { "description": "Base64 encoded file data", "type": "string", "maxLength": 1.3 * DOCUMENT_SIZE_LIMIT } }, "additionalProperties": false, "required": [ "data" ] }
But! From the RMB code:
// IMPORTANT!!!
// the body of the request will be read into memory for ALL PUT requests
// and for POST requests with the content-types below ONLY!!!
// multipart, for example will not be read by the body handler as vertx saves
// multiparts and www-encoded to disk - hence multiparts will be handled differently
// see uploadHandler further down
This means that we can't solve OOM issue if file size is larger that module heap size.
Multipart/form-data POST
This approach provides precise file size control and better performance. However, it requires considerable effort in changing the schemes and logic both on the client side and on the service side. Nevertheless, it may be implemented in the future if the issue of file upload performance and unification of file loading mechanisms becomes important.
types: invoice_document_file: type: file fileTypes: ['*/*'] maxLength: DOCUMENT_SIZE_LIMIT fileUpload: properties: invoiceDocument: type: string invoiceDocumentFile: description: The file to be uploaded required: true type: invoice_document_file ................ /documents: displayName: Document description: Manage documents associated with invoice post: description: Post document attachment/link; is: [validate] body: multipart/form-data: fileUpload
Upload by 2 requests
Solution can be implemented but required big changes of current implementation. We need to send one ordinary invoice document request with metadata and other with file, combine this data to save file.
Summary
Option 1 - not acceptable;
Option 2 - looks well but has implementation troubles;
Option 3/4 - looks well but require uploading by two requests.
UPD 04/14/2020: As far as only application/octet-stream uploading is available on FOLIO for now the previous approaches looks not so easy and effective. PoC based on application/octet-stream approach is presented below.
PoC based on application/octet-stream approach
Application/octet-stream mechanism allows to control sending request size in the following manner. For this approach we need to know approximate size of json request content size, metadata and Base64-encoded file size (original invoice document size (file size) + 30%). For example, file limit is 10 Mb, other json content size is 1 Mb. As a result MAX_DOCUMENT_SIZE limit parameter should be approx. 14Mb. This looks not so precise but should be enough for request size control.
Define POST API in raml-file with using application/octet-stream content type:
post: description: Create a new <<resourcePathName|!singularize>> item. body: application/octet-stream:
Refactor existing code for POST request processing:
private byte[] requestBytesArray = new byte[0]; private static final int MAX_DOCUMENT_SIZE = 350000000; @Validate @Stream @Override // This method will be executed for each chunk of stream in scope of sole exemplar of public API interface implementation. // Stream contains the stream contains the entire file with Base64-encoded file. public void postInvoiceInvoicesDocumentsById(String id, String lang, InputStream stream, Map<String, String> okapiHeaders, Handler<AsyncResult<Response>> asyncResultHandler, Context vertxContext) { DocumentHelper documentHelper = new DocumentHelper(okapiHeaders, vertxContext, lang); try { // This code will be executed as many times as there are chunks in the stream until RMB adds header "complete" // to indicate "end-of-stream". if(Objects.isNull(okapiHeaders.get("complete"))) { // Control oversize situation if (requestBytesArray.length < MAX_DOCUMENT_SIZE) { // If there are no oversize situation just add stream bytes to array requestBytesArray = ArrayUtils.addAll(requestBytesArray, IOUtils.toByteArray(stream)); } else { // Set request bytes array to null for clear memory in case of oversize to prevent memory overloading requestBytesArray = null; } } else { // This code will be executed one time after all chunks processing if (Objects.isNull(requestBytesArray)) { // Complete code with limit document oversize error documentHelper.addProcessingError(DOCUMENT_IS_TOO_LARGE.toError()); asyncResultHandler.handle(succeededFuture(documentHelper.buildErrorResponse(422))); } else { // If there are no oversize case just process ordinary logic InvoiceDocument entity = new JsonObject(IOUtils.toString(requestBytesArray, String.valueOf(StandardCharsets.UTF_8))).mapTo(InvoiceDocument.class); if (!entity.getDocumentMetadata().getInvoiceId().equals(id)) { documentHelper.addProcessingError(MISMATCH_BETWEEN_ID_IN_PATH_AND_BODY.toError()); asyncResultHandler.handle(succeededFuture(documentHelper.buildErrorResponse(422))); } else { documentHelper.createDocument(id, entity) .thenAccept(document -> { logInfo("Successfully created document with id={}", document); asyncResultHandler.handle(succeededFuture(documentHelper.buildResponseWithLocation(String.format(DOCUMENTS_LOCATION_PREFIX, id, document.getDocumentMetadata().getId()), document))); }) .exceptionally(t -> handleErrorResponse(asyncResultHandler, documentHelper, t)); } } } } catch (Exception e) { handleErrorResponse(asyncResultHandler, documentHelper, e); } }