SPIKE: MDEXP-215- How to order MARC subfields alphabetically

MDEXP-215 - Getting issue details... STATUS

The purpose of the spike is to investigate and provide a solution how to order MARC subfields alphabetically to allow a user to add mapping profile transformations of the same field but with different subfields in any order.

Solution

It was found that mar4j library doesn't have any approaches to cover the sorting of subfields, it just uses natural order in ArrayList.

Therefore it's necessary to implement a custom solution of sorting.

There two possible solution base on custom Comparator how to implement sorting:

1) Implementing of the Comparator in generate-marc-utils shared library. The sorting can be applied before subfields are written in MARC record in MarcRecordWriter.writeDataField in generate-marc-utils shared library. The specific Comparator should be implemented that should sort Map.Entry<Character, String> of subfield by key. It's important to return alphabetical subfields firstly and then numeric subfields should follow the alphabetical ones.

This solution includes sorting for all types of rules as generated by mapping profile transformation as default rules. As far as this library will be used by a few modules then clients of the library will not be able to change the ordering of the sorting.

2) Implementing of the Comparator in RuleFactory of mod-data-export.  This solution supposes to sorting the only list with DataSources that are built by mapping profile transformations. The idea is to implement a Comparator that sorts all DataSource objects with not empty subfields in alpha-numeric ordering first, and all other DataSource after. This solution doesn't relate to default rules.

Conclusion

After internal discussion in the Concorde team, it was decided to follow the second solution and implement sorting logic in RuleFactory of mod-data-export. Firstly it will have a better impact on the whole performance of mapping as far as the first approach supposes to apply sorting logic for any marc data field that may reduce performance a little bit. Secondarily the second approach is more flexible and it's focused on mapping profile transformations than on all types of rules. In this case, other clients of generate-marc-utils shared library are able to apply their logic of sorting if they need it.