Skip to end of banner
Go to start of banner

2021-08-20 Meeting notes

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Current »

Date

Attendees

Goals

Discussion items

TimeItemWhoNotes
Review the Kanban boardTeam

min.io / s3 compatible file storage

How should FOLIO store files (like PDFs attached to orders, agreements, etc.)?

TC discusses whether FOLIO should accept min.io as an official part of FOLIO platform:

Background:

mod-invoice-storage stores files (PDFs) into a JSONB property using base64 encoding:
https://github.com/folio-org/acq-models/blob/master/mod-invoice-storage/schemas/document.json

mod-agreements and mod-licenses store files (PDFs) into pg_largeobject without any tenant or module separation. The ERM development team doesn't use permissions for tenant separation, and it rejected the request to convert it into a PostgreSQL solution like bytea that provides tenant and module separation using schema (ERM-1779). The ERM development team wants to move to an external solution (UXPROD-3172) like min.io (or some other s3 compatible file storage).

mod-data-export-worker already uses min.io and the the FOLIO Ansible scripts install min.io for this module.

PostgreSQL supports storing binary files: https://wiki.postgresql.org/wiki/BinaryFilesInDB

  • "When should files be stored in the database? The common suggestion here is when the files have to be ACID."

  • "When is it bad idea to store binary files in the database? Very large files (100MB+), where performance is critical to the application."

  • Do smaller binary files result in bad performce? No, because "bytea and text data types both use TOAST (details here)."
  • PostgreSQL has two ways to store binary files:
    • bytea: This is a regular column type that can be used in any table. It works as usual with the schema separation where each combination of module and tenant has a dedicated database schema with a dedicated role that allows to access only the own schema.
    • pg_largeobject: This is a system table, PostgreSQL has exactly one. Access can be restricted to a role, this allows for module and tenant separation.

    • bytea with TOAST "makes the large object facility partially obsolete." (https://www.postgresql.org/docs/current/lo-intro.html)
  • For a detailed discussion see above BinaryFilesInDB link.

There are no performance issues with storing binary files in PostgreSQL.

Using a non-PostgreSQL option for storing binary files has been requested because it allows to split the backups into binary files and regular record data.

min.io server for multi-tenancy is licensed under GNU Affero Public License Version 3 (AGPLv3), this was changed in April 2021, it had been Apache 2 before. Min.io server for bare-metal or single-tenant and the MinIO Java SDK client continue to be released under Apache v2.0.

Proposal for a Security Team decision:

  • Binary files must be stored with strict tenant and module separation.
    • The TC has already discussed tenant separation and has made this decision that it's required. (see TC 2021-08-18 Meeting notes)
      • The security team agrees with the decision
  • A FOLIO MinIO security guide for developers and sysOps must be published and reviewed by the security team before more modules start using it
    • e.g. Including guidance for how to do the tenant/module separation 
    • The tech leads group will discuss this as noted during the TC meeting (see TC 2021-08-18 Meeting notes)

Reason for this decision:

    • This is to support multi-tenant installations.
    • This is to support modules the sysOp doesn't fully trust (as explained in https://dev.folio.org/faqs/explain-database-schema/).
    • Adding a new storage facility can easily create security issues if not done properly. FOLIO hasn't fixed the advanced database privileges issue with PostgreSQL (FOLIO-1935) yet.

Action items

  • No labels