Job Profile Wrangler

Job profiles are used to control flow of execution in Data Import for records; MARC or EDIFACT or otherwise. Job Profiles can comprise of 4 main objects:

  • Job Profile: Merely a header object that stores the name and type of profile

  • Match Profile: Used for determining if a record matches a criteria(s). There are two possible outcomes that will direct the flow of execution: MATCH and NON_MATCH

  • Action Profile: Used to perform actions like Create, Update & Modify on different types of objects

  • Mapping Profile: Used to determine how to convert a source record into the desired FOLIO object.

In a Job Profile hierarchy, only one Job Profile should exist. A Job Profile can be linked to one or many Match and Action Profiles. A Match Profile can be linked to other Match Profiles or Action Profiles. Action Profiles can only be linked to one Mapping Profile. Job Profile hierarchy will usually terminate at Mapping Profiles.

The dynamic nature of Job Profile hierarchies means there are many permutations that can occur in the wild, dependent on workflows of libraries. This brings a challenge with testing “all” flows with Data Import. A representative set of Job Profiles is needed to test most flows. One approach would be to recreate job profiles from production systems including their reference data. This is very labor intensive, reference data conflicts have to be resolved and the finished job profiles are not portable.

Instead of maintaining a repository of actual job profiles and their reference data, one approach is to have a repository of “shapes” of job profiles. It is possible to two distinct job profiles from two different libraries in different instances of FOLIO that result in different outcomes but the job profiles have the same structure; the same “shape”. Duplicating both job profiles would yield double effort but extra testing insight is not gained past the first job profile. Maintaining a distilled repository of job profiles brings the following benefits:

  • Minimal number of job profiles that can be tested in totality while still having decent testing surface area.

  • Reduced testing duration due to minimal number of job profiles.

  • No dependency on any library’s reference data. Existing reference data in a FOLIO tenant could be used.

Job profile hierarchies can be represented as graphs. When represented as graphs, Job profiles shapes/structure can be compared(isomorphism) and traversed with industry standard algorithms. Below is a sample job profile and its equivalent graph.

 

 

Here is an enumeration of features possible for a CLI app/Java Library that will form the basis of a comprehensive testing strategy for Data Import. Let’s call this library jpwrangler . It is a sibling maven module in the mod-di-converter-storage git repository.

Job Profile Repository Population

java -jar jpwrangler.jar >> set source --location https://folio-snapshot.dev.folio.org/ --accesstoken <JWT> --refreshtoken <JWT> >> set repository --path /path/to/repository >> wrangle

jpwrangler loads existing job profiles from its repository. The repository is a directly containing many files; each file presenting a job profile hierarchy. Each file in the repository has a filename pattern “jp-*.dot”; examples include “jp-1.dot”, “jp-273.dot”. Each file in the repository is written in the dot format for human readability and visualization. JGraphT is used to load the dot files and covert them to in-memory graph objects using the JGraphT library. Each graph is identified by the number in “jp-*.dot” filename pattern, so the identifier of the graph from the “jp-273.dot” file is “273”.

jpwrangler will pull job profile hierarchies from the source FOLIO instance and convert them into JGraphT graph objects. Each graph will be compared with all other graphs in the repository(in-memory). If there is an existing graph in the repository has the same “shape” as an incoming graph, the incoming graph is discarded. If an incoming graph does not have a similar “shape” in the repository, the incoming graph will be added to the repository. The graph will be saved in the dot format and assigned an identifier(+1 greater than the largest identifier in the repository).

When all is concluded, Engineering will have a repository of job profiles that can be referenced by simple identifiers during story refinement and test scenarios(Test Rail & Karate).

The default location for the job profile repository will be in the “resources” folder of the jpwrangler maven module. FOLIO community members can contribute their job profiles by

  • executing jpwrangler locally with the latest repository from mod-di-converter-storage repo against their FOLIO instance.

  • additional job profiles may be added to the repository.

  • pull requests are submitted to the mod-di-converter-storage git repository for approval. Additional job profiles would be added to the existing data set.

Job Profile Visualization

Since job profiles in the repository is saved in the dot format, they can be visualized with standard tools from GraphViz. Graph images could be hosted on FOLIO websites, test reports etc.

Job Profile Import

java -jar jpwrangler.jar >> set source --location https://folio-snapshot.dev.folio.org/ --accesstoken <JWT> --refreshtoken <JWT> >> set target --location https://folio-snapshot-2.dev.folio.org/ --accesstoken <JWT> --refreshtoken <JWT> >> import --job-profile-id <UUID> # import one job profile from source to target >> import --all # import all job profiles from source to target >> import --from-repository-id <Integer> # import one job profile from repository to target

Existing design for job profile import() has import, export & validation baked into mod-di-converter-storage. Validation means calling Inventory & Acquisitions domains to determine valid reference data. Ideally, appropriate dependencies should be defined in mod-di-converter-storage’s Module Descriptor. This would cause cyclic dependencies between Data Import, Inventory & Acquisitions. It may make more sense to externalize this functionality from mod-di-converter-storage to remove the dependencies and allow import/export into FOLIO versions that have already been released.

Source Record Generation

With job profiles represented as graphs, it is now possible to perform a depth first search traversal of each graph and create a source record that will match each path. This means that for any job profile in the repository, any number of source records can be generated to satisfy different paths. Reference data can use an existing FOLIO instance as a source. For MARC source records, marc4j can be utilized to generate MARC records.

Randomized Testing

It is now possible to generate random job profiles with differing parameters for breadth and depth together with appropriate source records for each path.