0011-Folio Query Machine (FQM)
Start Date | Mar 3, 2025 |
End Date | Apr 3, 2025 |
Contributors | @VBar @Matt Weaver |
Status | PUBLIC REVIEW |
Summary
Folio Query Machine (FQM) is a privileged component in Folio. Its role is to be the single source of cross-schema data (schema intended in the database sense here) within a Folio tenant installation. FQM provides access to specific data schemas via APIs, into which selective data from various other Folio Applications have been aggregated. FQM is thus a controlled centralized point in a Folio installation where read-only cross-application data are available.
FQM is a privileged Folio Application in two respects.
It is intended to be the only native Application to provide cross-schema data within Folio.
It implements its own enforcement of Security
Motivation
Folio has from its onset adopted a microservices pattern. A direct characteristic of which is that each application is entirely and exclusively responsible for the storage layer used to persist its data. Data belonging to an application are only accessible through the APIs that the application chooses to provide. Even though Folio contains asynchronous mechanisms (i.e. Kakfa) for publishing data, data are never directly accessed at the storage level (i.e. database) by another application.
As Folio has grown in scale in complexity, strict adherence to this version of a microservice pattern has impacted both performance and blocked the ability to develop additional (sometimes critical) features.
For example:
In order to calculate what might be a small intersection between two large datasets belonging to separate applications (e.g. users and circulation) it is necessary for a business logic module to retrieve two large datasets and calculate a small intersection in the java code.
Some business processes may have decision logic that requires status information from a half-dozen or more applications. By implementing this through the equivalent number of API calls, this results in a tight coupling between the modules involved and dependency entanglements.
Several use cases have been identified which would benefit from FQM as part of their solutions.
The Lists App. This application allows for the generation of data lists which are persisted and automatically updated as the underlying data are changed. A list consists of selective fields obtained from different applications. FQM provides both the ability to pull in those selective fields from different applications and also the ability to reduce the dataset according to filter criteria.
Bulk Edit. This application provides for simultaneously editing fields in a subset of records from particular entity types in Folio (such as Instances, Items, or Users). The first critical step is to identify the subset of records which will be modified in bulk. Typically, filtering is performed using criteria that examine fields from records belonging to other applications than the one whose records are being modified. The ability to efficiently generate the cross-application filtering is exactly what FQM is able to provide.
Record Deletion. A long standing feature gap in Folio is the ability to delete records which are no longer needed. The challenge has been the ability to establish whether a particular record is actually unused and therefore safe to delete, due to the high data coupling that exists between applications. For example, an inventory item record may be referenced in a number of applications other than Inventory: it may be tied to an open order; it may be part of an open loan; there may be fees/fines tied to it; it may be part of a course reserve; etc… A rule solution design has been created for the deletion problem. For any entity type clearance to delete could be identified and verified by polling each of the applications that may hold dependencies. FQM could provide a consolidated data record that pulls together all the flags that need to be verified, from all the applications holding dependencies into a single record and API call. In effect, improving not only performance but also security/privacy (since large numbers of broad permissions are no longer required and extraneous data not included).
Data Export. This application provides for the export of custom subsets of data records from various entities in Folio (such as Instances or Items). In order to create the subset of records filtering is required against other records (other than the ones exported) according to cross-dependencies. FQM provides the required efficient cross application filtering. - for filtering set of records to export
Data Import. Data Import contains the most complex workflows and algorithms in Folio. The highly configurable application will not only allow the importing of new records (create workflow) but will automatically establish whether a record already exists in Folio and thus needs to be updated (update workflow) to avoid record duplication. Determining whether a record already exists is a complex matching operation which requires attempting to match whatever limited metafields exists in the incoming record to existing metadata in records belonging to various entities and applications. This is the primary and most significant source of performance delays in Data Import. FQM can greatly improve the matching operations by aggregating all the matchable metadata values in a recordset from a single API call.
Operational Reports. Folio is lacking in simple operational reports that might be found in other ILSes. There are currently only two such “in-app” or “built-in” reports. FQM offers the ability to natively implement operational reports without requiring large data transfers to an external system.
Scope
The scope is FQM proper (mod-fqm-manager). It does not include applications that make use of FQM, such as the Lists Application.
Detailed Explanation/Design
Module: mod-fqm-manager
FQM Functionality
FQM (FOLIO Query Machine) is the engine that takes in queries, processes queries, and provides results. FQM consolidates data from different modules within FOLIO in real time, allowing users to conduct cross-application searches efficiently.
An FQM query consists of query condition, an entity type, and a set of requested data fields.
FQM query conditions are formulated in the FQL (FQM Query Language) syntax. Mod-fqm-manager is responsible for converting FQL queries into SQL queries which are run against the underlying database.
FQM defines an Entity (AKA “Entity Type”) as a queryable relation in Folio: a specific set of related data fields identified through an ID field. It is an FQM concept similar to a View found in relational databases. Each entity type has a defined set of available fields.
FQM Manager provides two types of API endpoints: synchronous and asynchronous.
The asynchronous endpoint is the preferred mechanism as it allows submitting a query and retrieving results through separate calls. Retrieval calls can be made without requiring the entire query execution to first complete. The asynchronous endpoint is the one to use if you need paged results.
The synchronous endpoint is mostly a legacy endpoint and only useful when running simple queries with small expected result sets. It allows submitting the query and receiving an immediate result in the response. In practice, this endpoint is used mostly for development.
FQM Database Architecture
Source Views
The Source Views are the mechanism by which data from other schemas are accessed for use in FQM. Each Source View is scoped to exposing a single resource from Folio. The Source Views are non-materialized SQL views on tables belonging to other schemas. They establish a pipeline for data from source modules to FQM, while avoiding direct access to the operational tables by FQM. They also provide a simple translation layer to help deal with breaking schema changes and implement functionality that is otherwise not available in FQM.
Entity Types
Entity types (AKA Entities) are the core components of FQM, upon which reports may be constructed. They come in two distinct flavors.
Simple Entities
Simple Entities are an abstraction over the source views, which add the permission requirements, and define what data are exposed and how to extract them. Simple Entities are the building block for Composite Entities.
Simple Entities will also reshape the data in a way that is suitable for reporting purposes. For example, if the Source View contains data in the JSONB format, the Simple Entity will present the corresponding data in a row and column format, making it easier to analyze and generate reports.
Composite Entities
Composite Entities pull together data from existing Simple Entities. Effectively, they provide the join operations across Simple Entities.
By utilizing the combination of Source Views, Simple Entities and Composite Entities, the mod-fqm-manager schema provides a robust foundation for data analysis and reporting within the system.
FQM Query Language (FQL)
FQL is the query language used to interact with FQM and which is based-on the MongoDB query syntax (link). It provides a simple and unambiguous syntax, which is trivial to implement in code.
An example:
{ "item.status_name": { "$in": [ "Missing", "aged to lost", "claimed returned", "declared lost", "long missing" ] } }
The above query provides a filter to only return records where “item.status_name” matches one of the specified values.
Another example:
{"$and":[{"users.active":{"$eq":"true"}},{"users.group_id":{"$nin":["0003a0cc-46e5-4ebe-8545-c917d3d8a673","06f2d60e-0b07-49ca-b8c7-e1d49808e0b7"]}}]}
When run against the User Entity, this query returns the set of active users not found in the specified patron groups (ERM, Access Only, Do Not Loan Anything).
Why not CQL
Contextual Query Language (CQL) is already used in multiple places in Folio. Careful consideration was given to using CQL for submitting queries to the FQM APIs. However, CQL was not able to meet the needs of FQM given the JSON storage of most of Folio’s data.
Consider the following:
The desire is to select all records of type ‘Type-A’ containing some object “foo” whose text value is “bar”. At the PostgreSQL level this requires a query that relies on the JSON path such as:
Field[type == ‘Type-A’].foo.text == ‘bar’
CQL does not support the ability to compose a query that exactly matches such a PostgreSQL condition.
The MongoDB query for the above mentioned use case is as follows.
"field": {
"$elemMatch": {
"type": "type-A",
"foo.text": "test"
}
}
Currently, FQL does not support the “$elemMatch” operator. However, leveraging MongoDB query syntax as the foundation of FQL allows for straightforward enhancements for querying complex JSON structures in future.
FQM Query Migration Interface
FQM includes a built-in API interface which is used to automatically account for FQM schema changes. It is used to provide support for schema changes in FQM which might affect externally composed queries. Submitting an FQL query to this interface will return the currently valid form of the query, while accounting for any schema changes.
FQM Query Builder
FQM Query Builder is a Folio UI Plugin that assists the user by interactively composing an FQM query. Note that since this is a UI Plugin is it available for re-use in other Folio UI modules. Additionally, it provides the ability to render query results in the UI. It’s effectively the FQM UI.
FQM Edge Module
FQM also provides an Edge module (edge-fqm) which can be used for external integration to Folio to submit FQL queries.
It accepts the same queries as can be submitted interactively via the FQM Query Builder.
Dependency Impact
FQM exposes data which are found in schemas controlled by other applications in Folio. Consequently, FQM can be affected by changes in those schemas, which it must accommodate. On the one hand FQM has the burden to keep up with any Folio schema changes in any tables that it exposes. In this regard it essentially retains - if not assumes - the dependencies between Folio applications. On the other hand, FQM provides a centralized opportunity to coordinate all such changes. Effectively, this becomes a shift of the dependency burden from individual application interactions to a more robust centralized dependency management approach.
Is it not the burden of individual module owners to notify anyone else of schema changes. FQM assumes the responsibility for updating its entities to account for such schema changes. In its current form, updates are managed manually by FQM maintainers. But design discussions are underway to provide automation for detecting and applying module schema changes.
When a schema change does occur, this can be accounted for in multiple ways, depending on the nature of the change:
Source views: It is trivial to update a source view itself to deal with some changes. For example, if a column is renamed, then the view can be updated to rename it back to its old name, so that FQM continues to use the old name. This approach can also be used for changes in data type and other changes of that nature. With this approach, FQM itself does not need any changes to account for the schema change.
Example (a column named “something” getting replaced with a jsonb field):
select id, name, something from mod_blah.thing
→select id, name, jsonb ->> ‘something’ as something from mod_blah.thing
- to FQM, it’s still named “something”
Entity type definitions: In the entity type definitions within FQM, each field definition has the configuration necessary to retrieve its corresponding data. That can be easily updated to use a different view or column, or switch data types, etc. The abstraction provided by the entity types makes it so that this change will usually be transparent to consumers.
Example (same change as above. This change would happen in the “something” field definition in the “thing” simple entity type):
valueGetter: ‘something’
→valueGetter: jsonb ->> ‘something’
- with this change, FQM would be “aware” of the change, since it’s effectively a config change to FQM, but it would be entirely transparent to consumers.
If a change in an entity type or source view does result in some externally visible change that would impact consumers, FQM’s migration interface provides a mechanism for consumers to automatically absorb the change without any added dev work in most cases (after the initial work to use the migration interface). Consumers can send a query to FQM and ask for the most up-to-date version of that query, then get an updated version back, which is compatible with the latest FQM. For example, when mod-lists is deployed, it uses this interface to make sure all of its stored lists don’t use any old, broken queries.
Security Considerations
Particular care must be taken in the FQM implementation so that it does not introduce data leakage vulnerabilities. The ownership of particular data falls upon individual applications which must implement their own access control via various mechanisms such as API permissions or acquisition units. FQM will respect those access limitations, even though it provides a separate channel to access those same data. Additionally, it should be noted that FQM only provides read-only access to Folio data through the Simple Entities.
FQM enforces permissions at two levels: API Permission; Entity Permissions
API Permissions
FQM Manager’s API permissions are the same as for any other Folio module. They simply define required permissions to call the endpoints that FQM Manager exposes.
However, relying only on API permissions is not sufficient, because a user with sufficient permission to call FQM’s APIs could conceivably pull data from any Simple Entity that they expose, regardless of whether that user has permissions to access such data directly from the source data’s APIs.
Therefore FQM also implements Data Permissions.
Data Permissions
Data Permissions operate at the level of Simple Entities which are the building blocks for all queries in FQM. By adhering to the convention to create a single Simple Entity for each exposed Data Source, it is possible to align the permissions defined by the source data’s API.
FQM accomplishes this by performing a dynamic permission check for any of the Simple Entities involved in a query. It will make a call to mod-permissions (or mod-roles-keycloak in the case of Eureka) to verify that the user possesses the required permissions to access the data source behind the simple entity. Because this process is dynamic it will automatically account for changes to permissions and permission assignments through Folio’s regular mechanisms.
Since FQM implements its own Folio permissions check, this makes it a privileged Folio Application.
FQM provides its own set of permissions for the APIs it implements. In the Poppy release a single permission is used to grant access to all Entity Types. In subsequent releases FQM will provide more granular permissions for EntityTypes, eventually down to the resource level.
Note that FQM currently only provides API-level access control. It does not yet enforce row-level access control as found in Acquisition Units. This is intended for future development. If Acquisition Units are in use and there is concern about not respecting Acquisition Unit restriction in FQM, then the relevant entities should be excluded from FQM at this time.
Risks and Drawbacks
FQM Manager is a privileged module and as such it has visibility into data that are owned by other modules. Not having the ability to cross-reference data between applications is one of the limitations of Folio which is becoming increasingly a blocker. Therefore, while it is a small risk to grant such privilege to FQM, it is necessary to do so somewhere in the overall system. FQM is the choice for providing this functionality.
FQM is however not omniscient nor is it omnipotent. It only has access to selective portions of the Folio data - as defined through the Source Views and Entity Type definitions. FQM has no understanding of the significance of the data it returns. It is the caller to FQM who retains that knowledge and interprets it. Furthermore, FQM does not have the ability to modify any of the related source data - it only grants read-only access.
As mentioned above, FQM has a dependency on the shape of the data which are owned by other modules. Schema changes to those modules need to be reflected in FQM’s schema. Again, this is why it is beneficial to locate all such dependencies in a single Application - FQM is a hub and spoke model - rather than attempt to reconcile dependencies for multiple modules in a mesh model. A schema change in a source module will only need to be understood and updated by the hub. In most cases the data presented by FQM to the various “spoke” Applications would even remain unchanged. Contrast this to a mesh model where each Application would need to independently track source schema changes, understand those schema and independently apply updates in a coordinated fashion.
FQM will inevitably carry a performance impact. Accessing data from other applications in a live context may impact the performance of those applications. There are mitigation strategies which can be employed to reduce that: using a database read replica; increasing the memory and/or cpus for the database. However, there will always remain some additional resource consumption from FQM whether for rebuilding or rewriting views, or harvesting data.
Note that while by default, FQM makes use of database views to access data, it is ignorant of the fact that views are involved. It merely considers that it is accessing some local tables in its own schema. Furthermore, it should be noted that FQM could be equally deployed using its own physical tables which would be filled by some external harvesting process.
Rationale and Alternatives
A long-standing blocking issue in Folio is how to cross-reference (intersect) data between data sets held by distinct microservices in an efficient manner. FQM must operate within the bounds of Folio.
FQM should not directly access storage belonging to other applications. This is the mantra of Folio’s strict interpretation of microservices. However, FQM adopts a more realistic perspective: FQM does not knowingly access storage belonging to other applications. FQM only access data within its own schema. It does not know, nor can it determine, how the data are made available there.
Evolution of the current FQM Data Design
The current form of FQM is the latest refinement of a series of evolutionary steps in the design. It has been optimized for practicality and efficiency.
Harvested Data Tables
The original design of FQM consisted of creating harvested tables within the scope of the FQM application. A secondary component would be an actual harvester which collected data from various data tables within Folio and gathered them in a single location. Some minor transformation would enhance the data to optimize them for the purposes of FQM.
However, it was realized early on that this was going to be a significant scope of effort - if only for the need to create a harvester. Furthermore, this would impose both the need to regularly move significant amounts of data, as well as to store essentially duplicative sets of data.
Another drawback of a harvesting approach was that data would not be available in realtime to FQM - which was one of the requirements. Delays could be significant depending on the size and change rates of the tables involved.
Rather than harvest and store, why not leverage the database capabilities and use materialized views instead?
Materialized Views
The next iteration of FQM implemented materialized views. The creation of a harvester could be avoided and for all practical purposes the materialized views would be identical to harvested tables from the point of view of the FQM modules.
However, it was realized that the creation and maintenance of materialized views was still imposing a burden on the database engine. Data was still being moved and recalculated on a regular basis. With the larger datasets the refresh time amounted to hours.
If dynamic views were used instead of materialized views, then the movement of data could be avoided altogether resulting in acceptable times for keeping the data up-to-date. This would also make it possible to provide realtime Folio data via FQM.
Dynamic Views
The current state of FQM makes use of non-materialized views to avoid moving and storing large amounts of data. The views are built on existing Folio storage tables and are provided in read-only configuration to FQM. From the point of view of FQM, the code does not differentiate between the originally harvested tables and the dynamic views. These are considered to be the same by the FQM code. FQM Manager (see diagram above) is unaware that the data sources it is exposing are views rather than harvested local tables.
It is also the expectation that FQM implementations may revert to harvested tables at some point in the future with no impact on the code which consumes the data they contain. In particular it is recognized that this may be necessary for specific scenarios: distributed Folio systems across multiple databases; meeting data localization requirements.
There are still further enhancements possible for FQM.
Simple and Composite Entities
In its current form - as described above - FQM only creates views on source data. Entities are used to dynamically generate SQL queries based on these views. This allows for better security and less need to understand the schema of other applications. It also provides for better reuse of data sources and ease in expanding the set of available Composite Entities.
Other Approaches
In various discussions, alternative approaches to FQM have been considered and debated, including the following.
Kafka Messaging
One such alternative is to retain the notion of harvested data tables and provide a decoupled harvesting mechanism using push solution via messaging. Specifically, all changes would publish change messages to a Kafka topic(s) which would be subscribed by FQM.
Benefits
Inversion of control: modules are responsible for notifying of changes
Possible opportunity to absorb schema changes in the messaging contracts
Removes the direct coupling of FQM to individual applications (and their data storage)
Challenges
Requires all modules to participate in order to ensure proper data coverage
Requires modules to implement Kafka messaging integration (most have not)
Requires deep modifications on all modules to publish incremental changes into Kafka messages
Still moves lots of data around
Sensitive to business logic interpretation which is error prone (e.g. Acquisition apps)
App-level APIs for Operational Data
Another alternative mentioned would be to implement data collection by consuming the APIs provided by each application. Possibly, including the creation of new dedicated APIs for the purposes of FQM data collection
Benefits
Avoids integration to the database layer
May not require much development on existing Applications (assuming existing APIs are sufficient)
Challenges
Requires moving lots of data around
Intrinsic poor performance
May require development on most existing Applications.
Does not address the issue of joins between Applications.
Requires FQM to interpret wildly varying API interface formats
Timing
FQM has already been conditionally approved and is part of Folio releases. Furthermore, it is currently a key dependency for the Lists app and a number others (e.g. Data Export, Bulk Edit).
Please elaborate on this translation layer? Seems like a major function of the system given that in the current architecture, modules could change their schemas freely without concern for breaking others. How does it handle breaking changes?