Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Initial work to clarify differences between data copying options

...

  • Reduces the amount of individual downstream requests (and hence the Okapi proxying overhead)
  • Requires at least one downstream request per destination module
  • Requires at least one database query per downstream module
  • Might reduce the response time off the downstream request (compared to the combination of )
  • Might reduce the load on downstream modules (depending upon how the combined request is handled, it is possible the load increases)
  • Reduction in downstream requests is limited to number of record types within a single module
  • Increases the amount of APIs to maintain (what I call the surface area of the module)
  • Increases the coupling between modules (by introducing the clients context into the other module)
  • Increases the coupling between the record types involved (e.g. it's harder to move record types to other modules when they are included in APIs together, changes to them ripple across APIs)

Copy data into circulation

Consume messages produced (via Kafka) by other modules to build views of the data needed to perform a check out.

The biggest challenge with this option is the community's tolerance to using potentially stale data for making decisions.

This suggests processing the messages and using a database from mod-circulation rather than mod-circulation-storage to avoid the overhead of needing to request the copied data from a downstream module.

Characteristics

  • Requires no downstream requests for fetching data during check out process
  • Increases the potential for stale data to be used for decisions
  • Is contrary to constraints that may still be present in FOLIO
  • Introduces complexity of processing messages and persistent storage into mod-circulation
  • Introduces a dependency on a database from mod-circulation
  • Introduces a dependency on messages produced by other modules
  • State changes still require a downstream request (and the requisite proxying overhead)

Variations

Store the copied data in mod-circulation-storage

Rather than introducing a database in mod-circulation, use the database that is already used by mod-circulation-storage.

Downstream requests will be needed from mod-circulation to mod-circulation-storage to access the views.

Cache the copied data in each instance of mod-circulation

Rather than introducing a database in mod-circulation, use a volatile cache within each instance of mod-circulation and use downstream requests to populate the cache.

...

Use copies of data to make decisions

In order for it to be acceptable, a variety of stakeholders within the community would need to accept some tolerance for decisions being made with stale information. When I've talked to folks about this previously, they have been uncomfortable with doing this (see above).

Characteristics

  • Requires no downstream requests for fetching data during check out process
  • Increases the potential for stale data to be used for decisions
  • Is contrary to constraints that may still be present in FOLIO
  • Introduces complexity of processing messages and persistent storage into mod-circulation
  • Introduces a dependency on a database from mod-circulation
  • Introduces a dependency on messages produced by other modules
  • State changes still require a downstream request (and the requisite proxying overhead)

Variations

The characteristics of this approach varies more based upon some design decisions we make. A couple of the significant ones are outlined below.

These are only a very high level comparison of the characteristics, there are lots of alternative designs in both of these categories that lead to different characteristics.

Where is the data kept?


MemoryPostgreSQL
Volatilitylost when the module instance is terminatedretained even if module instances are terminated
Localitylocal copies for each module instanceshared between module instances
Access Controlshared needs to be controlled with code within the modulecan be controlled using mechanisms provided by the database server
ResponsivenessLikely faster if cached value is present, likely slower if  notDependent upon network and database load
Record Type SuitabilityBetter suited to smaller sets that change rarely, e.g. reference typesCan be used for any kind of record type
Infrastructure needsNoneRequires a database for mod-circulation

How is the copied data updated?


Periodic HTTP requestsMessages consumed from Kafka
Freshness

Dependent upon frequency of periodic refresh.

Likely to be lead to data being stale for longer than with messaging

Dependent upon message processing latency
Access requirementsNeeds a system user or module permissions granting to a timer endpointNeeds access to Kafka topics for every record type (assuming record snapshot based messages as used with mod-search)
Initial population / manual state refreshRequires requests to fetch all records for for all cached records typesEither requires reprocessing of persistent topic (not currently allowed by FOLIO standards) or custom process (similar to mod-search re-index process)
Load on other modules during synchronisationCould be significant. Dependent upon number of record types and quantity of recordsPotentially none with persistent topics (not currently allowed by FOLIO standards)
Freshness measurement




Combine the business logic and storage modules together

Characteristics

  • Removes all downstream for record types within the circulation domain e.g. loans, requests, loan policies etc (include state changes e.g. creating a loan, fulfilling a request)
  • Removes the distinction between business logic and storage representations of those records types
  • Allows for state changes within the circulation domain to be done within a database transaction
  • Is contrary to constraints that may still be present in FOLIO
  • Storage modules have been used to workaround cyclic dependencies constraints in Okapi, removing them might involve changing other modules to avoid this in other ways

...

However, this work requires:

  • adoption of techniques (e.g. synchronising copied data, messaging, caching) and technologies (e.g. Kafka) unfamiliar to most developers in

...

  • FOLIO 
  • agreement from

...

  • many stakeholders (e.g. SME's, TC)

...

  • that it is acceptable to use potentially stale data for making decisions

Appendices

Definitions

PhraseDefinition
Downstream requestA request made by a module (via Okapi) in order to fulfil the original incoming request e.g. mod-circulation makes a request to mod-users to fetch patron information
Response timeThe time taken from the client making the request to receiving a response

...