Spike: MODKBEKBJ-482 - Investigate and Design export option for titles in cost-per-use view

Spike: MODKBEKBJ-482 - Investigate and Design export option for titles in cost-per-use view

IN PROGRESS

Participants:

Role

Name

Approval

Role

Name

Approval

Solution Architect

@Taras Spashchenko

 - approved

Java Lead

@Dima Tkachenko

 

Product Owner

@Khalilah Gambrell

 

Spike goals:

  1. Define fields set 

  2. Define a format for the export document

  3. Design an endpoint for export

  4. Define possible difficulties

General view of the export tab 

 The first phase of "Cost-per-use" feature should include the option for the user to export title records tied to particular package. As it is seen from the screenshots, export tab location is under the 

"Actions" button. 

1. Define fields set  

Based on https://folio-org.atlassian.net/browse/MODKBEKBJ-479, the structure of the exported document should include following fields: 

  • Title

  • Type

  • Cost

  • Usage

  • Cost per use

  • % of usage

Below you can find an example table with sample data:

Here is a "Package" column added to provide a context of the package and will not be included to the the result document.

The last column "Product Owner Approval" - defines whether Product owner is agree with suggested format for the values

- will not be included to the result file

- approved to be added to the document

- under discussion

Column Name

Package

Title

Type

Cost

Usage

Cost per use

% of usage

year

Column Name

Package

Title

Type

Cost

Usage

Cost per use

% of usage

year

Example value

EBSCO Open Access Journals

Writings of Professor B. B. Edwards

Book

500.00

2225

0.22

16

2019

EBSCO Open Access Journals

The Seasons and the Symphony

Streaming Video

800.00

4544

0.18

20

2019

Product Owner Approval

 

 

 

 

 

 

2. Define a format for the export document

 

The result format of the export file should be csv.

An example export file content, based on the columns defined in section 1. Define fields set, is available below, where

the name of the file - <package_name>.csv 

EBSCO Open Access Journals.csv
Title, Type, Cost, Usage, Cost per use, % of usage Writings of Professor B. B. Edwards,Book,500.00,2225,0.22,16 The Seasons and the Symphony,Streaming Video,800.00,4544, 0.18,20

3. Design an endpoint for export

 

The "mod-kb-ebsco-java" module should provide an endpoint for the package-title export option.

The definition for the ModuleDescriptor. json file

Method

GET

Endpoint

/eholdings/packages/{packageId}/resources/costperuse/export

Permission Required

"kb-ebsco.package-resources-costperuse.export.collection.get"

Description

Get cost-per-use information for the titles in csv format.

The definition for the raml file:

... /export: get: description: | Endpoint provides a cost-per-use information about the titles included into the package in csv format. responses: 200: description: OK body: text/csv: example: strict: false value: !include examples/export/package_title_get_response.csv

4. Define possible difficulties

The main concern and point to be mentioned in this section - the response time from the APIGEE service for the huge amount of the entities (i.e ~10000 items).

The actual log files can be found https://folio-org.atlassian.net/browse/MODKBEKBJ-500 issue. It means for the current moment that APIGEE service is not able to proceed big number of request quickly. It is directly slow down the export process and, from the user experience, an end user should be notified with some message, that export file is preparing. 

 

Proposed export flow:

The steps for the export titles process:

  1. UI sends a request to the backend to get the csv information about the package titles.

  2. Backend is 

    1. fetching title ids from the holdings table

      1. if records are in table - fetch them

      2. if no records - return empty result

    2. define request number needed to be performed to APIGEE

      1. split total title ids/1000 to get request number

      2. prepare batches with title ids max 1000 entities

    3. fetch titles cost-per-use info 

      1. if APIGEE returns [200 OK] - continue processing

      2. if APIGEE returns [<error>] - return error message to the user

    4. calculate title cost-per-use info

    5. send export information in text/csv format

  3. UI gets text file and

    1. create a file based on package name

    2. download file for the user

an existing ui-erm-usage code base can be applied.

Questions:

Q

A

Q

A

1

What is the preferred delimiter for exported file - coma(,) or tab (\t)?

if delimiter is coma(,) then:

  • back-end have to escape (,) symbol in title name 

example
Reading: Harvard Views of Readers, Readership, and Reading History

if delimiter is tab(\t), then

  • it might be not that separation user expects

Answer from @Khalilah Gambrell:

2

Why backend can not return a file?

Answer from @Natalia Zaitseva :

backend has to get the package name for the export file name from RM API.

As we know the service has limitation for the number of requests, so, backend would like not

to send additional requests without high need.

Limitations:

Update:

for the PoC testing of maximum number of titles that can be exported, the following data used:

Vagrant box - 'testing'

holdings credentials - sandbox

 

Export findings

number of exported titles

 response time 

size

notes/example files

number of exported titles

 response time 

size

notes/example files

< 20 000

< 22s

~1.7 Mb

 

56 508

51s 15 ms

3.71 Mb

 

71 482

1 m 14s

7.93 Mb

 

184 963

2 m 50s

16.74 Mb

 

200 336

3m 02s

17.7 Mb

 

300 000

-

-

OOM error 

 

The OutOfMemoryError occurs for the export titles if their range between 200 000 - 300 000 entities. The current setup for the sandbox KB Credentials does not have such amount.

The attached heap dump is generated for the title with 251217 entities. Issue, that represents described edge case is created - https://folio-org.atlassian.net/browse/MODKBEKBJ-531

 

Created issue related to the timeout issue on Rancher - https://folio-org.atlassian.net/browse/FOLIO-2891