Spike: MODKBEKBJ-458 ability to filter packages by whether the package is assigned or unassigned to an agreement

MODKBEKBJ-458 - Getting issue details... STATUS

Spike should address the requirements from MODKBEKBJ-304.

Goal of the spike is to find possible solutions to combine existing eHoldings filters with filtering by agreement assignment status.

Problem

Agreement assignment status is currently available only inside Agreements app.

General overview

Packages filtering

Currently packages could be searched with "Tags only" filter or combination of other filters:

Tags or access types filter:

Package ids are retrieved from database (applying necessary filtering and pagination) and packages are queried from holdingsIQ by id one by one.

Other filters:

Packages are queried from holdingsIQ using filtering and pagination.

Agreements

Agreements are retrieved from mod-agreements by api '/erm/sas' with filtering and pagination:

Posting agreements is performed by the same api:

Assigning agreements to packages is done on eHoldings(package) UI using PUT request of the same api.

Agreement for a desired package are retrieved from mod-agreements by packageId:

Possible solutions

1 Without changes to database (not recommended)

To filter packages 'assigned' to agreements we need to call mod-agreements api multiple times (because existing api may return duplicates) to fill a complete page of unique package ids or ask mod-agreements team to develop api that returns only unique relations (in such case we need only one call to mod-agreements); then pass composed package ids to holdingsIQ to retrieve packages.

Above solution can be combined with existing mod-kb-ebsco-java filters only by calling mod-agreements api multiple times (retrieve package ids from mod-agreements, then filter by data from mod-kb-ebsco-java database; repeat this until we have a complete page to return on UI). This filtering becomes more complex in case we use existing api (wich can return duplicates), because we need to call mod-agreements even more times.

To filter packages 'not assigned' to agreements we need to query packages from holdingsIQ first, then we need to check every package assignment by calling mod-agreements api (repeat this actions until we fill a complete page to return on front-end).

This solution is not recommended because with a data grow - number of api calls will grow exponentially.

2 Add package-agreement relation to mod-kb-ebsco (preferrable)

  1. Add new table to database with packageId→agreementId relation (kbCredentialsId, recordId(packageId), recordType(package), agreementId). MODKBEKBJ-651 - Getting issue details... STATUS
  2. Implement api for posting package-agreement relations (which will write relation to database and send to mod-agreements). MODKBEKBJ-652 - Getting issue details... STATUS
  3. Copy existing relations from mod-agreements (develop sql script. Note: it should be executed not as sqlite change but separately with necessary rights to both schemas. Add script somewhere in classpath and mention it in release notes).
  4. Implement agreement assignment filtering. For 'assigned': retrieve packageIds from new table for filtering and pass ids to holdingsIQ (same logic as for tags). For 'not assigned' filtering we should query packages from holdingsIQ (possibly multiple times), then filter them by existence in database(repeat until we get complete page to return on front-end). We can use holdingsIQ filters along with 'not assigned' filter because no preprocessing is done before querying from holdingsIQ. MODKBEKBJ-653 - Getting issue details... STATUS
  5. Replace call to mod-agreements for assignment with call to new mod-kb-ebsco-java api (it will send corresponding request to mod-agrrements by itself) (UI, eHoldings). UIEH-1289 - Getting issue details... STATUS
  6. Agreement should be posted to mod-agreements without package relation and then new assignment api of mod-kb-ebsco should be called (which will send assignment to mod-agreements by itself).
  7. Replace call to mod-agreements for assignment with call to new mod-kb-ebsco-java api (it will send corresponding request to mod-agrrements by itself) (UI, agreements, agreement line page). UIEH-1290 - Getting issue details... STATUS
  8. Implement 'unassign' api on kb-ebsco-java to delete package-agreement relation. MODKBEKBJ-654 - Getting issue details... STATUS
  9. Add call for 'unassign' api of kb-ebsco-java on agreement unassignment on package page (UI). UIEH-1291 - Getting issue details... STATUS
  10. Add call for 'unassign' api of kb-ebsco-java on agreement line deletion on agreement page (UI). UIEH-1292 - Getting issue details... STATUS
  11. Add possibility to combine 'assigned' filter with other kb-ebsco-java internal filters. MODKBEKBJ-655 - Getting issue details... STATUS

Notes

It's not possible to combine some filters because package data is queried from holdingsIQ with their own filtering and pagination thus only filters that holdingsIQ accepts can be combined with each other.

Internal filters of mod-kb ebsco can be combined with each other too(not implemented yet, see MODKBEKBJ-371). After filtering kb-ebsco-java pass only packageIds to retrieve packages by ids from holdingsIQ without any filtering.

Multiple kb's: package-agreement relation will be stored in similar way as tags - connected to kb credentials. The only question is about data migration from mod-agreements, it should be addressed in corresponding story.

Next steps: create related stories.

Conclusion

Some universities have at about 200-600 of packages assigned to an agreement but there are cases with 1-2 thousands of packages assigned to an agreement so second solution is more appropriate.

Further discussions

Filters combining clarifications

  1. Assigned packages filtering will not be able to be combined with package name filtering, or other holdingsIQ filterings (same as tags).
  2. Not assigned packages filtering will be able to be used together with package name filtering and other holdingsIQ filtering.

Problems

ProblemDescriptionMetrics

Not assigned packages querying

Not assigned packages querying is quiet problematic on big amount of data

F.e. if we have 2000 packages for a kb credentials and 1980 of them are assigned, unassigned packages (f.e.) are in the end if list, page size is 20 and we want to get first page for 20 unassigned packages.

In such case we need to query holdingsIQ 100 times, also query assigned packages from database 100 times (for each page returned from holdingsIQ we need to check whether the package is assigned to an agrrement according to data in database).

Approximate metrics for this 'extreme' case are:

  • query 20 packages from holdingsIQ - about 400ms (queried multiple times via postman). For 2000 packages number should be around 40000ms (40s)
  • check whether 20 packages are assigned to an agreements by query to database - about 2ms (queried test data from local database via dbeaver. Note: for production database number may increase because of remote database). For 2000 packages number should be around 200 ms. Note: testing was done on a tags table with similar (to the planned table) structure and 3000 tags present.
  • check whether 20 packages are assigned to an agreements by query to database - about 500ms (queried test data from spitfire rancher database via pgadmin. Note: for production database number may be less because of bigger amount of resources). For 2000 packages number should be around 50000ms (50s). Note: testing was done on a tags table with similar (to the planned table) structure and 3000 tags present.

To sum up, to get 20 unassigned packages in a bad scenario it can take about 90s + some time to return data on front-end. Note: this numbers are very approximate and may vary based on hardware and internet connection.

Migrating data from mod-agreementsmod-agreements doesn't have any information about kb credentials and we need to assign each relation (package→agreement) to ones. In order to achieve this, we need to query each package for each relation from holdingsIQ using each kb credentials (because same package can exist for multiple kb credentials)

F.e. if we have 2000 relations (package→agreement) for a tenant in mod agreements and also we have 5 kb credentials configured, then we need query holdingsIQ 10000 times (for each packageId and each kb credentials).

Approximate metrics:

  • query 10 (mod-agreements returns max 10 per query even if page size is bigger) entitlements (package→agreement relations) from mod-agreements with 2000 entitlements present in mod-agreements - about 220ms (tested on snapshot envoronment). For 2000 relations should be about 44000ms (44s)
  • query package by id from holdingsIQ - 400ms (queried mutiple times via postman). For 2000 relations should be around 800000ms (800s / 13min 20sec)

Writing to database was not tested because table structure should be created and data generated. Also it's not really necessary to test this because writing 2000 record in database is a fast process and should take no more than a second (maybe few seconds, depends on environment).

To sum up, to migrate 2000 relations from mod-agreements it can take about 14 minutes + some time to save data to database. Note: this numbers are very approximate and may vary based on hardware and internet connection.

To eliminate problem regarding 'not assigned' packages querying we can periodically query all packages(same as we do for holdings now) to have them in mod-kb-ebsco-java database. But in this case we will not have an ability to use holdingsIQ filters (same as for assigned packages).

Also we can get assigned selected (only) packages from database already because we have scheduled job to query holdings, but it executes once in 5 days.

As an alternative to package filtering by assignment status, additional column to the result list with assignment status could be considered.