OAI-PMH Spring migration
Dao/Service layer
Currently, oai-pmh uses the reactive JOOQ tool for writing type-safe queries, accessing the database, and generating POJO objects.
Mostly all the quires are plain SQL's and the JOOQ based Dao can be replaced with JpaRepository which will allow doing the same and will reduce a lot of code required to be written as we did it with JOOQ. As well the model that represents database tables should be created.
For keeping the reactive approach in order to not reduce the performance we can use @Async annotation with the collaboration of Services methods that encapsulate the Dao calls.
The part of the work required is to adjust the CompletableFutre instead of vertx Future class.
Updating the Unit tests is required.
Controllers/API doc
For generating API and controller interfaces we will use the swagger tool. It is already supplied by the folio-spring-base library. We just have to set up the yaml file with endpoint descriptions.
Processing verbs
We will have the same single controller for oai-pmh requests but instead of having logic based on inheritance, it would be better to provide a solution for handling particular verb requests using the strategy design pattern. The common validation methods like “validateListRequest”, “validateIdentifier” and other response build related methods like “buildResumptionToken”, “addHeader”, “buildOaiMetadata” will be moved to the Utility classes and be used within the strategy’s implementations. It will make code cleaner and easier to read.
Filtering conditions/Sets
For these entities the separate simple REST controller will be created. There is nothing hard with that.
Logging
For logging the log4j2 will be used. As well the folio-spring-base provides the Lombok dependency which has @Log4j2 annotation that injects the logger to classes.
Marc21 with holdings request handling
For this metadata prefix, a separate strategy should be implemented since it has more complex logic in comparison with other verb handling strategies. The general problem regarding this request is that the current approach doesn’t allow handling exceptions of invalids JSON. We are forced to provide the JSON parser and set up the mode and we cannot take control of mapping the incoming bytes to JSON on one's own in order to catch the exceptions. The JSON mapping is performed by vertx and when the exceptions raise the whole process is terminated which is not eligible.
The rescue is a Kafka. Using Kafka, we will have an opportunity to map incoming strings to JSON ourselves and catch exceptions by logging the instances that led to the errors.
The general flow will stay as it is but downloading instances approach will be replaced with the Kafka integration.
Splitting the handler
If we take a look at the handler of this metadata prefix, we will see a lot of code that is hard to read, and it is required to leave the single abstraction level within the handler. The other code for building queries, building XML response, and enriching it with the data etc. should be moved at a separate utility class or service. That will simplify the understanding of the code and will make the Unit testing easier.
Init API/Tenant API
Init API code contains only the setting up of the system properties which is duplicated at the Tenant API and even more these properties are substituted by configs from mod-configuration.
The Init API should be removed and the properties can be loaded with a simple @PropertySource annotation.
Tenant API logic should be preserved but we have to implement the TenantAPI provided by the folio-spring-base library which is a spring-based implementation.
Preserved functionality
Request class
We have a Request object which is a container of request parameters and request metadata. It has its own builder and a couple of methods for decoding resumption token and partial request validation.
It is a pretty huge class, using the Lombok annotations we can just specify the properties and the Lombok tool will generate the setters/getters/builder/constructor and etc.
The methods for decoding the resumption token and validating the parameters should be moved to a separate class since this Request class should look like simple POJO and this is not good idea to mix business logic with the plain POJO.
Liquibase
Liquibase dependencies are already provided by the folio-spring-base library and the manner of enriching the database using the liquibase will be preserved.
Domain classes
The Verb enum in pair with VerbValidator will stay as it is.
ResponseConverter, XSLTMapper, and MarcXmlMapper don't depend on vertx and are used for translating the string to marc XML and should be kept.
Jaxb2 and marc4j
The JAXB library is used for generating POJO for constructing the OAI-PMH response and further translating to XML with help of the marc4j library. They both are independent of vertx and should be preserved.
Migration risks and possible mitigation
Risk: mostly all of the vertx code is written in an asynchronous manner and the performance can deteriorate if we will not keep the same approach.
Mitigation: the JDK class "CompleteableFuture" in pair with spring-related @Async annotation solves the problem and will keep the async. way. of code processing.
Risk: ReactivePostgresClient is vertx based and allows to perform DB quires in an async manner. Running queries in async mode is very important for marc21_withholdings request since there are many saves are performed and they should be run quickly in parallel.
Mitigation: spring supplies a connection pool with DataSource configured and the database calls can be surrounded with @Async annotation described above. So the async way for running several queries at the same time will be kept.
Risk: Marc21_withholdings is a high load request and should have a great performance for quick processing of millions of instances. The vertx based streaming approach provides great performance and is not a blocking process. The current performance should be saved.
Mitigation: Kafka has established itself in the market as a tool for delivering amazing performance when configured correctly. We have just to provide enough resources to Kafka and we will have the same performance.
Risk related to Kafka: The speed of getting instances from inventory is greater than saving them to the database and this can lead to system hanging and database overloading. In vertx based solution we have an opportunity to pause the consuming of instances from inventory while the accumulated database tasks for saving instances will be completed and the database will be unloaded. We have to have the same possibility with a Kafka-based solution.
Mitigation: In Kafka, we have the opportunity to not only subscribe to an instances producer(which will push us to process them always without any pausing that is not eligible in our case) but to ask for data from the producer when we want it. In this way, we can track the database loading and ask for a batch of instances on-demand as a callback of saving operations. So, the problem with getting the database quires being stuck will not rise and the performance will be saved.