SPIKE:MDEXP-195 - Investigate exporting inventory records into delimited format
The purpose of the spike is to investigate and provide a technical solution for an export subset of the record fields in the delimited format, especially inventory item records that don't have MARC record equivalent.
Solution
The technical solution contains several parts that require different changes. The main idea is generating .csv or another file in delimited format contains field values of inventory records. In this case, there no needs to make requests to SRS and records should be revived directly form mod-inventory-storage. At the same time, each export can have a different combination of fields, therefore, it necessary to provide the possibility for a user to choose a set of fields before running data export. New functionality should be integrated into the existing export flow and should not affect implemented mapping functionality.
Example of the output .csv file with records content:
1) Creating a new kind of export profile to allow the user to choose a set of fields. To allow the user to choose a set of record fields it's required to new user experience on the UI side. From the side of data and entities there are two options:
a) Create a new entity to collect all fields. Such should pretty similar to mapping profile entity:
In this case, only one mapping profile should be chosen at the same time during export.
Props:
- the separate entity doesn't relate to existing mapping profile
Cons:
- duplication of the similar data
- more changes on UI side
b) Update an existing mapping profile and add a new collection of fields as a relation. In this case, just a new entity record_field should be created and assigned with the mapping profile entity.
Props:
- it would be easier to integrate the new functionality using the same data structure
- fewer changes on UI side
Cons:
- more responsibilities for one entity
2) Update HTTP requests to the backend to send a new profile with a fieldset. In case if a profile for delimited records is chosen then the specific request should be sent to the backend side. There are two options:
a) Add a new request parameter to export request determines what exactly profile to use.
b) Make a new request to a new separate endpoint and call InputDataManager with a new profile. This approach is more preferable as far as splits export flows in two.
3) Update ExportManager and InputDataManager to trigger a new flow of mapping and generating delimited .csv file with records. First of all, it's necessary to create a new file definition for the output .csv file. One of the options is to create a new implementation InputDataManager and extends InputDataManagerImpl and overrides createExportFileDefinition method to set a name for specific .csv file. Other changes relate to ExportManager. In case if the new profile exists in ExportPayload the requests to SRS should be skipped and inventory records should be retrieved directly. The new method of MappingService should be created to read and generate content for a delimited file. For example, it can be the next method:
List<Map<String, String>> map(List<JsonObject> records, MappingProfile mappingProfile, String jobExecutionId, OkapiConnectionParams connectionParams);
Where one Map object in the List represents one row of the file. As an alternate option, a new separate service can be created for this functionality.
4) Create a new Reader base on the JsonPath to read values from records. This Reader should follow a similar approach as EntityReader from the generating-marc-utils library. It should return one Map<String, String> object for the one record where the key is field name and value is the field value. For simple string fields, like "title", only one JsonPath field RecordField.path should be used. For example, for "title" it's "$.instance.title". To get values for reference data fields, like ISBN, additional field RecordField.referenceDataPath and RecordField.referenceDataName should be used. Existing ReferenceDataProvider returns necessary reference data values to get it certain by key. Example of RecordField field for ISBN:
{ "path": "$.identifiers[*].value, "referenceDataPath": "$.identifiers[*].identifiersPathId", "referenceDataName": "ISBN" ... }
The logic should be similar to TranslationsFunctionHolder.SET_IDENTIFIER to get the necessary value.
5) Create a new Writer base on Java libraries for CSV. To write records data to .csv file can be used some Java libraries, for example, Apache Commons CSV (https://commons.apache.org/proper/commons-csv/) or OpenCSV (http://opencsv.sourceforge.net/). We just need to get records values from MappingService and pass them to the Writer level on ExportManager. There no necessity of updating the post export process as far as we still should store a new generated file in S3 bucket.