Architecture Blueprint Sessions - WolfCon 2020
Two sessions were used during WolfCon 2020 to discuss topics that relate to the Folio Architectural Blueprint. The Architecture Blueprint being defined as those longer term platform capabilities which are either strategic or foundational for future feature development. The outlook is forward looking, as compared to technical debt which looks to the past.
The format of these sessions was to allocate 10 minutes for each topic. After a brief introduction each topic was discussed with in the room. The goals of these sessions were:
- inform on items being considered for a formal architectural blueprint
- gather feedback on those items
- obtain a sense of the perceived urgency around each
- NOT to delve into the solution space for each item
Blueprint Items Discussed
Day 1:
- Security (fallout of the Security Audit)
- Refactoring Okapi
- Tenant Management
- Multi-Tenancy and Cross-Tenancy
- Adopting PubSub
- Adding Support for GDPR
Day 2:
- Search Engine
- Users and Permissions
- Automation Engine
- GraphQL
- Database Connectivity
Not Discussed:
- Codex
- Inter-Folio Integration
Materials
Recordings of Sessions
Day 1:https://drive.google.com/open?id=15mAQu3u0CEbR1AX2NrjfL0A6mHHFuvM9
Day 2: https://drive.google.com/open?id=1ZR2PRTKHro9HRaNxzQHbDRGZBNu-OBhVArch
Session Notes
Thanks to Zak
Day 1:
many topics
forward looking vs backward looking (tech debt)
security audit fallout
audit in February; expect results ~march 1
weigh the urgency of the fallout vs other changes?
there may be changes that roll together with other non-security changes
urgency is high: expect to act on it basically as soon at it comes out
we have a security policy but not yet a team; whoops; we should do that ASAP
doc exists but not yet public
will be publicized when ready
security@folio.org, non-public jira project
job of sec team to triage those jobs
determine whom to notify, and when, and with how much detail
emergency release process?
not in place at present.
hosting provider may provide a work-around to limit the impact
at the same time, issue will be raised with the community to dev a fix
once impl'ed, verified, then SP can deploy to hosting instance
permanent fix may take a while...
how does this interact with q'ly release schedule?
general agreement: yes, respond to the security audit
refactoring okapi
okapi has grown in role since original conception, impl
many features
proxy gateway
tenant
dep mgmt (build time)
discovery system (installed modulues)
tenant APIs (provisioning, upgrades)
etc etc: timer, pre/post filters, etc
goal: separate into multiple components
high risk of making change to a monolith
is security vulnerability
e.g. setup requires elevated perms
at the same time, stripes talks to okapi
this is not ideal; these two are incompatible
goal: restricted perms on runtime role
elevated perms on setup role
separate tools are more free to evolve because are independent
splitting also allows these components to scale independently
dep mgmt: does this compound that problem? still more apps, ugh
(devil's advocate)
these aren't apps; are tools.
yes, proliferation is a problem, but separation of concerns is worthwhile
can separate services without necessarily separating codebases
this would be nice; could resolve some issues of compatibility
don't have to separate each services into separate modules
but some separation feels like a good thing
general agreement to discuss this further, esp WRT security
but uncertainty about the details of this refactoring
other lurking issues: performance, efficiency
what of shanghai's reqs: okapi must change to accomodate that, or wholesale changes to comm model
separating discovery from gateway, don't have to change gateway
maybe refactor is a misnomer here; we are discussing outward facing changes
... not simply internal refactoring
a discussion about okapi is a platform discussion
this is re-architecture, not refactor
(this is scarier; has bigger impact on the rest of the platform)
can change process model without having to change everything
refactor now to make re-arch later less difficult?
agree this is blueprint item
timing may depend on sec audit
discuss this in parallel/immediately after the sec audit?
agree to discuss soon-ish
tenant management
currently owned by okapi, at least in part
1. admin component for provisioning, upgrading tenants
2. runtime component for tenant registration
WRT updates/migrations:
part of modules?
part of tenant API?
devops sees different perspective
may not matter where the tenant API impl lives, i.e. in okapi or elsewhere
migrations need the elevated perms; can run on BE as admin tool if separated
security aspect is the silver lining of this: isolating this from Okapi is good
this is def. part of okapi refactoring
v. impt for multi-tenancy arch.
present issue is highly isolated tenant fx'ality; to coordinate across tenants, must build it separately
does this separation make simple cases simple if you are not impl'ing multi-tenant?
WRT tenant mgmt only, for single tenant is minor help
for multi tenant, is HUGE help
this also affects multiple tenants using one okapi
this is dependent on multi-tenant libs planning to go-live in summer 2020
agree this is a blueprint item
important, but less so than security
what is the price of this kind of change after a multi-tenant place goes live?
depends on the change....
maybe is huge: can't go live until have multi-tenancy, and need this first....
maybe must include proxy as part of this?
this also addresses the consortia problem
investigate how reshare handles this?
they run fully isolated tenants, exchange data across tenants
need clear understanding of motivation for these items
what is full multi-tenancy?
multiple tenants sharing data; this does not exist at present
agree to discuss ... "slater"
does this even belong in okapi if it stays within Okapi?
don't necessarily have to break into own service
full multi-tenancy
how do we provide full multi-tenancy
MT means sharing data across tenants
want to separate initiating tenant from target tenant of an action
this is maybe more comfortable as a roadmap item than tenant-mgmt
must talk about this: may or may not imply LOTS of work
there's a lot of complexity here
permissions are currently tenant-scoped; need a new perm system
talk about this sooner in order to prioritize it sooner
security is an impt consideration here: isolation provides security; people want that
the soln here must not undermine isolation elsewhere
agree this is blueprint item
discuss sooner
adopt pubsub
have dev'ed P/S as part of source-record-storage
so can handle comm between modules with event-based mechanism
developed with this in mind. yay!
decouples modules from one another
overlaps with techdebt somewhat
must reexamine some impls;
can also
discovery: realtime upgrades is holy grail
so from vufind's view: this is HUGE
very strong feelings about this being essential to future extensibility
folio-core is hard to break apart right now
having this is more of a hook module
can have more uni-directional deps
strong agreement all around
do we impl/reimpl some integrations?
most urgent is to make sure it is present somewhere?
not part of edelweiss, is on master now?
folijet to demo
this is the beginning of Saga support
could provide for distributed transactions in some circ functions
strongly agree this is a blueprint item
discuss soon, assuming is actually available
support for GDPR
support is no brainer; must do
mostly concern for hosting provider
right of access, rectification, erasure, restrict processing, data portability
can do some of this now but is highly manual process
was always intended to make this easier by providing some APIs
arch req not just a feature req because imposes some req's on modules
does GDPR give you a timeline? believe yes but don't know
this is crucial for this to be a non-US centric project
not urgent, later is OK but is required
Chalmers is live; this puts pressure on them.
manual process is OK now, but won't be at scale
Day 2:
goal
id issues NOT solve them
then prioritize, set a timeline
search engine
search built on top of postgres at present
but will hit a tipping point where that starts to fail
next level: dedicated search engine, e.g. elasticsearch
is it functional, performant, easy to implement
what about results display?
universal search is a separate/related question
is peristence engine not best for postgres? e.g. cassandra?
this proposal suggests search as a bolt-on
changing the persistence mechanism would be more integrated
full-text engine vs rdbms have different strengths; may need both
OLE used solr: problem then was then that solr indices became treated as source of truth
postgresql is a swiss army knife of persistence
rdbms, message queue, json storage...
discussions WRT postgres services may make HA instances hard
but adding other services, then have to solve those same problems for add'l tools
any solution will take massaging to get good perf
fulltext search is powerful ... but doesn't support localization
will have to wrestle with this eventually
NLAustralia: we have gone down the search engine road; happy with it
do we have exp with large DBs?
we know the lucene-based search indexes that perform fine
if we lack the experience, we shouldn't research it, we should find ppl with exp
this knowledge exists; don't spend time rediscovering it
discovery layer will surely exist separately, and surely use a search engine
but we are building an ILS, not a discover layer
agreement this is impt
agreement on sooner rather than later
users and permissions
introduce tenant-level and system-level users
every user in the system now gets perms assigned
introduce role-based perms
at system level, have batch operations
don't want to create dedicated users for this, is hacky
1. system-level users
2. roles
abstraction layer between users and roles
will be impt when workflows; e.g. notifications assigned to roles
can assign multiple psets to a single user
psets aggregate permissions
roles would aggregate users
no way to cluster users at present
UChicago: this would be helpful
roles provide hierarchical permission assignment
teams: ERM has some notion of this
folio doesn't have any data-level protections
perms are focused around endpoints only
could roles help get us there?
e.g. a group of people who should only be able to see orders with a certain attr
e.g. only math-dept orders, only history-dept orders...
agreement: should we split tenant level/system level users
agreement: should explore roles
this will only get harder if we wait
agreement: system user notion is more impt than roles
automation engine / "workflow engine"
engine required to provide automation on the backend; NOT user-visible
should/when will we adopt such an engine
could eventually help with user-visible things, e.g. tasklists
nuance: site specific tweaks to processes are necessary
right: we are talking about building the player piano; of course scroll is changeable
are there concerns that necessary hooks in the system may be absent?
camunda POC attached to GitHub:folio-org, but lots of progress in GitHub:tamu
TAMU: using workflow for data migration
this would be unique in this marketplace! whoop whoop!
any relation to pubsub? ATM, handled separately
agreement: this belongs on roadmap
agreement: sooner; there is Q3/Q4 work scheduled that is related
ought to be able to do anything manually that is being automated
graphQL
prototype exists
allows you to tailor a single request to get data in desired shape
behind that is system to traverse the graph to assemble this
initial look: optimization for UI; fewer requests
subsequent look: want to streamline dataflow between modules, not just in UI
right now can't get only IDs; get full records, which is less efficient
could streamline APIs?
at simple impl level, could simply add biz layer atop APIs
still making API calls under the hood though...
don't know about opportunity for actual optimization
graphql could maybe provide some efficiencies
database connectivity
we have module level connections currently
if have multi-tenant system, any tenant uses single cx
would be great to have more granular connectivity
e.g. assign per-tenant cx
e.g. interface-level cx, e.g. some data in local stores, some in cloud...
e.g. GET/PUT/POST separation, separate cx to read-optimized/write-optimized stores
for folks wanting to share infrastructure with separate data, this does not work well
last bullet is solved outside of folio; we should not solve it ourselves
secret mgmt is related, maybe?
agreement this is blueprint item
agreement this is urgent
codex!
inter-folio integration
CALIS has explored this somewhat
how do multiple folios interact? transparently to user.
agreement this is later
, multiple selections available,