|
My comments
APIs -
I think that as in most systems there is a written and unwritten rule about backwards compatibility - this issue arises not only in micro services but in any system exposing APIs (of course MS take this to the extreme) - any service that exposes an API does so with the intent to support it in a backwards compatible way. Changes in APIs that must break this rule should create a separate API and support both for a period of time until the deprecated service is discarded.
Auditing -
Auditing is an important aspect but i believe it would be best to let the storage engine handle this internally if possible - i do have experience with this being implemented by the application - but i think it would be wiser for the built in auditing of the storage engines to handle this - for example - mongo enterprise does offer this feature
an important concept as well is a type of copying of configuration - for example - a customer wants to set up a staging environment and test out configs there - once happy - they want to export those into a prod environment and not have to redo everything - this means configurations existing in both prod and staging --> staging overwrites (and this is not based on the _id - but on business keys)
Latency -
i think it would be great to create a set of performance requirements up front - what is the expected latency for the ui to display data from an API - round trip including display / without display ? 500 / 750 milli? if we set this then we can check ourselves along the way as well.
Reporting -
i think we discussed this - but all data needs to be exposed in a database so that standard db tools can be used to create a wide variety of reports (alot of them customizable by the customer - existing tools allow for this - as for open source - pentaho comes to mind)
Backup restore -
i think we should aim for basically , well almost everything in the database - even potentially files - backing up and restoring, DR would be simplified in such a case. then it comes down to a question of the storage engine - some allow you to take snapshots at any point in time - others, for example oracle need to go into a type of backing up mode while the backup happens - if we are building for an aws type main install this falls on the service providers i believe (aws snapshots, etc...) - i actually have implemented this using jenkins - exposing a REST API in the application to back up / restore / restart / etc.. cassandra which called Jekins REST APIs to run commands against the storage.
Performance -
I think this gets broken down into a few phases
- Performance testing and scalability testing which should give a rough idea of the amount of data / load / tenants a single saas install can support
- Performance monitoring of system resources and the software to track the need of adding resources (both software and hardware) - ELK / Nagios / etc...
- The tools needed to add resources (basically deployment of replicated pieces of software - including DB) - i dont necessarily believe elasticity is needed in our case - i see the big value in this when resources are needed in spurts and i am not sure this is the case here - but time will tell.
- Ability of the software to scale out while understanding performance ramifications of the scale out - i dont think this should be an issue - however for example in Oracle , using rac requires a dedicated network between oracle servers or it wont work
- Ability to move tenants across installations - to balance out load over multiple installs if needed
- Throttle tenant load
I dont believe (6) is critical at stage 1 - i think that the heavy lifting will be scheduled by the service providers and not by tenants - i think it is a big mistake to allow tenants to run any major batch processes on there own whenever they like on the service provider's hardware, a saas no no in my opinion - these processes will usually be in off peak hours (midnight - 7 am or something of the sort) - for example, i would expect at least one install in the west coast, east coast, europe, asia, etc..
A single install will need to support the expected heavy lifting (for example overnight) - hence a single install will need to be sized to grow accordingly.
As for real time i believe we will supply software solutions for this - for example - getting real time availability of items - think google scholar - might require a write through cache or something to support the load.
|