First presentation of LDP road map | Nassib | For our next meeting...- Considerations in developing this first road map for LDP releases:
- Key requirements:
- Many reports have been ranked by institutions for "go live (MVP)", which means that they will have to be written and debugged within the next few months.
- It is not recommended to do significant SQL reporting using FOLIO's internal operational database, as has often been done with many traditional ILS systems. One reason is that there is no requirement that FOLIO modules use the same database, which means that cross-domain table joins on the operational database may break irreparably in the future.
- Reporting analysts want an easy and familiar query model, and one that works with common reporting tools.
- Reporting analysts will want queries to run efficiently.
- Reporting SIG members have repeatedly communicated the need for access to all FOLIO data for reporting purposes.
- Challenges and constraints:
- The number of storage interfaces in FOLIO will soon exceed 100. Although ETL for most of them is relatively simple, synchronizing such a large number of tables reliably on schema changes requires very active coordination with FOLIO, which so far has not proved to be possible.
- LDP critical dependencies on FOLIO core development which have been requested and flagged as critical beginning in late 2018 will likely not be addressed until mid-2020 or beyond, based on discussions with the FOLIO capacity planning and project management groups.
- FOLIO is requesting that "go live (MVP)" features be
ready - completed by January 2020, to be released in Summer 2020.
- Future developer resources appear to continue to be roughly
around - 1 FTE or less on average, consisting of several part-time developers.
- LDP road map is based on these considerations, and aims to release features according to FOLIO's projected time lines for addressing critical dependencies as well as available developer resources.LDP v1.01.0 proposed core features, for "go live (MVP)":
- LDP will support Support for ad hoc, cross-domain queries for all, or a very large proportion, of FOLIO data extracted from storage modules. We would ask this working group and the Reporting SIG to help us determine, in the near future, the definition of "all FOLIO data" required for inclusion in the LDP.
- LDP data will include Include MARC records extracted from FOLIO SRS and transformed for easier querying.
- The LDP developers and this working group will aim to implement all reports requested for "go live (MVP)" if possible.
- Historical data will be retained in the LDP but not transformed into a single schema.
- LDP data can database recommended to be refreshed nightly once per day from the FOLIO operational database.LDP will optionally anonymize
- Support for optional anonymization of personal data.LDP will include implementation The Data Privacy WG will propose requirements for this feature, in particular which fields should be anonymized.
- Implementation guidelines (documentation) for local tables.
- Proposing a unique design that is pragmatic given the above considerations, but also very functional:
- Queries can be written using familiar relational attributes in the case of many common attributes (e.g.
loans.loanDate ), but will require JSON paths to access some nested attributes (e.g. json_extract_path_text(loans.data, 'status', 'name') which refers to the "name" subfield located within the "status" field). - In other words, a set of "common" attributes will be available as relational attributes, while other attributes will have to be retrieved from JSON data (in the same table) using the
json_extract_path_text() function. - Access to JSON data takes the form
json_extract_path_text(A, F1, F2, ...) where A is the attribute containing the JSON data, and F1, F2, ... is a list of up to five nested JSON fields which specifies a path into the JSON data. (If this is not clear, it helps a lot to look at an example.) - The JSON data include the complete set of data provided by FOLIO, while the relational attributes are created as a convenience to make writing queries easier.
(1) - Support for PostgreSQL and Redshift database systems.
- Proposed data model design for LDP 1.0.
- Schedule: LDP Beta (feature complete) in January 2020, LDP 1.0 in Summer 2020.
- LDP beyond 1.0: Historical queries using a single schema, (2) ETL, and (3) full relational or star schema are planned can be implemented for later releases of the LDP but are highly dependent on the identified critical dependencies and availability of developer resources.
- Data are available now Current support for query development:
- The test database is now using the proposed data model design for LDP 1.0.
- Also in the test database
sufficient to support at least - are data needed for the Circ Item Detail
report.
|