Central File/Object Storage (WIP)

Overview

A need exists across several FOLIO apps to be able to store files.  Instead of each of those apps implementing their own solution, it probably makes sense to create centralized file storage facilities.  This page outlines the problem and possible designs.

(warning) Reader Beware (warning)

This page was originally created as a place to capture some ideas.  Much of it was stream of consciousness - I never finished my thoughts here but the page remains in case I ever revisit the idea.  In short, this is half baked, and not completely thought-through... 

Requirements

  • Must be able to support files of varied sizes, including those which are quite large
  • Must be able to upload and persist text and binary files with any MIME type and/or extension
  • Must allow files to be retrieved
  • Must store file metadata (size, type, date uploaded, etc.)
  • Must be able to list/search for files without actually retrieving file contents (return metadata only)
  • Must segregate files by tenant

Nice-To-Haves

  • Should support multiple underlying storage technologies
  • Should be able to change some metadata (e.g. filename), but file content should be immutable.
  • Should allow for segregation of files by app/domain/timebox/etc

Schemas

file_metadata

PropertyTypeRequiredDefaultDescriptionNotes/Example
idstringNo<system generated>UUID of the file metadata recorde.g. 1a220b67-7ddf-4b33-b9d4-5ce6157134e3
namestringYes
Filename to associate with the filee.g. inv20190702-13.pdf
sizenumberNo<system calculated>Size of the file in bytese.g. 5123680
typestringYes
MIME type of the filee,g, application/pdf
metadatametadataNo<system generated>Standard record metadatacreated by, creation date, updated by, updated date, etc.
uristringNo<system generated>URI pointing to the file e.g. s3://diku.file-storage.us-east-1/invoices/1a220b67-7ddf-4b33-b9d4-5ce6157134e3
domainstringNo
Optional domain used to group filese.g. invoices
storageTypestringNoTBD (Postgres?)Optionally specify the type of storage to usemust be one of the enabled, supported storage types

file_collection

PropertyTypeRequiredDefaultDescriptionNotes/Example
filesarray<file_metadata>Yes[ ]collection of file_metadata

storage_type

PropertyTypeRequiredDefaultDescriptionNotes/Example
namestringYes
name for this type of storageunique
TBD




storage_type_collection

PropertyTypeRequiredDefaultDescriptionNotes/Example
storageTypesarray<storage_type>Yes[ ]collection of storage_type

Storage Layer

A new module named mod-file-storage is introduced.

API

MethodEndpointRequestResponseDescriptionNotes
POST/file-storage/filesfile_metadatafile_metadataCreate a file metadata record

GET

/file-storage/files

CQL queryfile_metadata_collectionSearch/list file metadata records
GET/file-storage/files/<id>NAfile_metadataGet a particular file metadata record
PUT/file-storage/files/<id>file_metadatafile_metadataUpdate a file metadata recordOnly certain fields would be allowed to be updated
DELETE/file-storage/files/<id>NA204Delete file metadata and content
POST/file-storage/files/<id>/contents<MIME type from file_metadata>201Upload file contentCan by binary or text.  Data is stored in the configured repository
GET/file-storage/files/<id>/contentsNA<MIME type from file_metadata>Get the contents of a particular fileResponse can be binary or text.  Data is retrieved from the configured repository
GET/file-storage/storage-typesNAstorage_type_collectionList enabled, supported storage types for your tenant

Storage

Multiple underlying storage technologies would be supported

  • Files could potentially be retrieved by the client directly via the URI in the file_metadata, but this is subject to any access controls imposed by the underlying storage
  • Files can also be retrieved via the storage module.

Configuration

  • Parameters are passed into the _tenant API indicating which storage type to use, and provide configuration (e.g. connection details, etc.)
    • Details of how secrets are stored are TBD - perhaps we can leverage AWS Param Store or Vault

JIRA

A convenient place for links to related JIRA features/stories/etc.

  • TBD

Open Issues

  • Need to add details of the API
  • Need to add details of how the underlying storage is provisioned/configured