[FOLIO-364] discuss, reach consensus and respond to OLEs Operational consideration Created: 26/Oct/16  Updated: 18/Jan/19

Status: Open
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: P3
Reporter: Jakub Skoczen Assignee: Jakub Skoczen
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Sprint:
Development Team: Core: Platform

 Description   

The operational consideration document is located here: https://docs.google.com/document/d/15Z43DfKV89vn2EGvDkwzQg-tUhfUIGue082aC6xKGFE/edit

Jakub has provided initial answers for some of the questions, some are unanswered. All answers should be reviewed to make sure they are align with FOLIO's architecture plans and the missing answers need to be provided.



 Comments   
Comment by shale99 [ 31/Oct/16 ]

My comments

APIs -
I think that as in most systems there is a written and unwritten rule about backwards compatibility - this issue arises not only in micro services but in any system exposing APIs (of course MS take this to the extreme) - any service that exposes an API does so with the intent to support it in a backwards compatible way. Changes in APIs that must break this rule should create a separate API and support both for a period of time until the deprecated service is discarded.

Auditing -
Auditing is an important aspect but i believe it would be best to let the storage engine handle this internally if possible - i do have experience with this being implemented by the application - but i think it would be wiser for the built in auditing of the storage engines to handle this - for example - mongo enterprise does offer this feature
an important concept as well is a type of copying of configuration - for example - a customer wants to set up a staging environment and test out configs there - once happy - they want to export those into a prod environment and not have to redo everything - this means configurations existing in both prod and staging --> staging overwrites (and this is not based on the _id - but on business keys)

Latency -
i think it would be great to create a set of performance requirements up front - what is the expected latency for the ui to display data from an API - round trip including display / without display ? 500 / 750 milli? if we set this then we can check ourselves along the way as well.

Reporting -
i think we discussed this - but all data needs to be exposed in a database so that standard db tools can be used to create a wide variety of reports (alot of them customizable by the customer - existing tools allow for this - as for open source - pentaho comes to mind)

Backup restore -
i think we should aim for basically , well almost everything in the database - even potentially files - backing up and restoring, DR would be simplified in such a case. then it comes down to a question of the storage engine - some allow you to take snapshots at any point in time - others, for example oracle need to go into a type of backing up mode while the backup happens - if we are building for an aws type main install this falls on the service providers i believe (aws snapshots, etc...) - i actually have implemented this using jenkins - exposing a REST API in the application to back up / restore / restart / etc.. cassandra which called Jekins REST APIs to run commands against the storage.

Performance -
I think this gets broken down into a few phases

  1. Performance testing and scalability testing which should give a rough idea of the amount of data / load / tenants a single saas install can support
  2. Performance monitoring of system resources and the software to track the need of adding resources (both software and hardware) - ELK / Nagios / etc...
  3. The tools needed to add resources (basically deployment of replicated pieces of software - including DB) - i dont necessarily believe elasticity is needed in our case - i see the big value in this when resources are needed in spurts and i am not sure this is the case here - but time will tell.
  4. Ability of the software to scale out while understanding performance ramifications of the scale out - i dont think this should be an issue - however for example in Oracle , using rac requires a dedicated network between oracle servers or it wont work
  5. Ability to move tenants across installations - to balance out load over multiple installs if needed
  6. Throttle tenant load
    I dont believe (6) is critical at stage 1 - i think that the heavy lifting will be scheduled by the service providers and not by tenants - i think it is a big mistake to allow tenants to run any major batch processes on there own whenever they like on the service provider's hardware, a saas no no in my opinion - these processes will usually be in off peak hours (midnight - 7 am or something of the sort) - for example, i would expect at least one install in the west coast, east coast, europe, asia, etc..
    A single install will need to support the expected heavy lifting (for example overnight) - hence a single install will need to be sized to grow accordingly.
    As for real time i believe we will supply software solutions for this - for example - getting real time availability of items - think google scholar - might require a write through cache or something to support the load.
Generated at Thu Feb 08 23:05:12 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.