Central File/Object Storage (WIP)

Central File/Object Storage (WIP)

Overview

A need exists across several FOLIO apps to be able to store files.  Instead of each of those apps implementing their own solution, it probably makes sense to create centralized file storage facilities.  This page outlines the problem and possible designs.

 Reader Beware 

This page was originally created as a place to capture some ideas.  Much of it was stream of consciousness - I never finished my thoughts here but the page remains in case I ever revisit the idea.  In short, this is half baked, and not completely thought-through... 

Requirements

  • Must be able to support files of varied sizes, including those which are quite large

  • Must be able to upload and persist text and binary files with any MIME type and/or extension

  • Must allow files to be retrieved

  • Must store file metadata (size, type, date uploaded, etc.)

  • Must be able to list/search for files without actually retrieving file contents (return metadata only)

  • Must segregate files by tenant

Nice-To-Haves

  • Should support multiple underlying storage technologies

  • Should be able to change some metadata (e.g. filename), but file content should be immutable.

  • Should allow for segregation of files by app/domain/timebox/etc

Schemas

file_metadata

Property

Type

Required

Default

Description

Notes/Example

Property

Type

Required

Default

Description

Notes/Example

id

string

No

<system generated>

UUID of the file metadata record

e.g. 1a220b67-7ddf-4b33-b9d4-5ce6157134e3

name

string

Yes



Filename to associate with the file

e.g. inv20190702-13.pdf

size

number

No

<system calculated>

Size of the file in bytes

e.g. 5123680

type

string

Yes



MIME type of the file

e,g, application/pdf

metadata

metadata

No

<system generated>

Standard record metadata

created by, creation date, updated by, updated date, etc.

uri

string

No

<system generated>

URI pointing to the file 

e.g. s3://diku.file-storage.us-east-1/invoices/1a220b67-7ddf-4b33-b9d4-5ce6157134e3

domain

string

No



Optional domain used to group files

e.g. invoices

storageType

string

No

TBD (Postgres?)

Optionally specify the type of storage to use

must be one of the enabled, supported storage types

file_collection

Property

Type

Required

Default

Description

Notes/Example

Property

Type

Required

Default

Description

Notes/Example

files

array<file_metadata>

Yes

[ ]

collection of file_metadata



storage_type

Property

Type

Required

Default

Description

Notes/Example

Property

Type

Required

Default

Description

Notes/Example

name

string

Yes



name for this type of storage

unique

TBD











storage_type_collection

Property

Type

Required

Default

Description

Notes/Example

Property

Type

Required

Default

Description

Notes/Example

storageTypes

array<storage_type>

Yes

[ ]

collection of storage_type



Storage Layer

A new module named mod-file-storage is introduced.

API

Method

Endpoint

Request

Response

Description

Notes

Method

Endpoint

Request

Response

Description

Notes

POST

/file-storage/files

file_metadata

file_metadata

Create a file metadata record



GET

/file-storage/files

CQL query

file_metadata_collection

Search/list file metadata records



GET

/file-storage/files/<id>

NA

file_metadata

Get a particular file metadata record



PUT

/file-storage/files/<id>

file_metadata

file_metadata

Update a file metadata record

Only certain fields would be allowed to be updated

DELETE

/file-storage/files/<id>

NA

204

Delete file metadata and content



POST

/file-storage/files/<id>/contents

<MIME type from file_metadata>

201

Upload file content

Can by binary or text.  Data is stored in the configured repository

GET

/file-storage/files/<id>/contents

NA

<MIME type from file_metadata>

Get the contents of a particular file

Response can be binary or text.  Data is retrieved from the configured repository

GET

/file-storage/storage-types

NA

storage_type_collection

List enabled, supported storage types for your tenant



Storage

Multiple underlying storage technologies would be supported

  • Files could potentially be retrieved by the client directly via the URI in the file_metadata, but this is subject to any access controls imposed by the underlying storage

  • Files can also be retrieved via the storage module.

Configuration

  • Parameters are passed into the _tenant API indicating which storage type to use, and provide configuration (e.g. connection details, etc.)

    • Details of how secrets are stored are TBD - perhaps we can leverage AWS Param Store or Vault

JIRA

A convenient place for links to related JIRA features/stories/etc.

  • TBD

Open Issues

  • Need to add details of the API

  • Need to add details of how the underlying storage is provisioned/configured