Spike: MODKBEKBJ-482 - Investigate and Design export option for titles in cost-per-use view

MODKBEKBJ-482 - Getting issue details... STATUS

IN PROGRESS

Participants:

RoleNameApproval
Solution Architect(tick) - approved
Java Lead
Product Owner

Spike goals:

  1. Define fields set 

  2. Define a format for the export document

  3. Design an endpoint for export

  4. Define possible difficulties

General view of the export tab 

 The first phase of "Cost-per-use" feature should include the option for the user to export title records tied to particular package. As it is seen from the screenshots, export tab location is under the 

"Actions" button. 


1. Define fields set  

Based on MODKBEKBJ-479 - Getting issue details... STATUS , the structure of the exported document should include following fields: 

  • Title
  • Type
  • Cost
  • Usage
  • Cost per use
  • % of usage

Below you can find an example table with sample data:

Here is a "Package" column added to provide a context of the package and will not be included to the the result document.

The last column "Product Owner Approval" - defines whether Product owner is agree with suggested format for the values

(minus) - will not be included to the result file

(tick) - approved to be added to the document

(question) - under discussion

Column NamePackage

Title

Type

Cost

Usage

Cost per use

% of usage

year
Example value

EBSCO Open Access Journals

Writings of Professor B. B. EdwardsBook500.0022250.22162019

EBSCO Open Access Journals

The Seasons and the Symphony

Streaming Video

800.00

4544

0.18

20

2019
Product Owner Approval(minus)





(question)

2. Define a format for the export document


The result format of the export file should be csv.

An example export file content, based on the columns defined in section 1. Define fields set, is available below, where

the name of the file - <package_name>.csv 

EBSCO Open Access Journals.csv
Title, Type, Cost, Usage, Cost per use, % of usage 
Writings of Professor B. B. Edwards,Book,500.00,2225,0.22,16 
The Seasons and the Symphony,Streaming Video,800.00,4544, 0.18,20

3. Design an endpoint for export


The "mod-kb-ebsco-java" module should provide an endpoint for the package-title export option.

The definition for the ModuleDescriptor. json file

MethodGET

Endpoint

/eholdings/packages/{packageId}/resources/costperuse/export

Permission Required

"kb-ebsco.package-resources-costperuse.export.collection.get"

DescriptionGet cost-per-use information for the titles in csv format.

The definition for the raml file:

... 
/export:
   get:
	 description: |
	   Endpoint provides a cost-per-use information about the titles included into the package in csv format.
 	 responses:
	   200:
         description: OK 
         body:
           text/csv:
			 example:
			   strict: false
			   value: !include examples/export/package_title_get_response.csv

4. Define possible difficulties

The main concern and point to be mentioned in this section - the response time from the APIGEE service for the huge amount of the entities (i.e ~10000 items).

The actual log files can be found MODKBEKBJ-500 - Getting issue details... STATUS issue. It means for the current moment that APIGEE service is not able to proceed big number of request quickly. It is directly slow down the export process and, from the user experience, an end user should be notified with some message, that export file is preparing. 


Proposed export flow:

The steps for the export titles process:

  1. UI sends a request to the backend to get the csv information about the package titles.
  2. Backend is 
    1. fetching title ids from the holdings table
      1. if records are in table - fetch them
      2. if no records - return empty result
    2. define request number needed to be performed to APIGEE
      1. split total title ids/1000 to get request number
      2. prepare batches with title ids max 1000 entities
    3. fetch titles cost-per-use info 
      1. if APIGEE returns [200 OK] - continue processing
      2. if APIGEE returns [<error>] - return error message to the user
    4. calculate title cost-per-use info
    5. send export information in text/csv format
  3. UI gets text file and
    1. create a file based on package name
    2. download file for the user

an existing ui-erm-usage code base can be applied.

Questions:


QA
1

What is the preferred delimiter for exported file - coma(,) or tab (\t)?

if delimiter is coma(,) then:

  • back-end have to escape (,) symbol in title name 
example
Reading: Harvard Views of Readers, Readership, and Reading History

if delimiter is tab(\t), then

  • it might be not that separation user expects

Answer from Khalilah Gambrell:

2Why backend can not return a file?

Answer from Natalia Zaitseva :

backend has to get the package name for the export file name from RM API.

As we know the service has limitation for the number of requests, so, backend would like not

to send additional requests without high need.

Limitations:
  • the max amount of title cos-per-use information should be defined in a separate spike.
    • PoC backend story - MODKBEKBJ-518 - Getting issue details... STATUS
    • PoC frontend story - UIEH-990 - Getting issue details... STATUS

Update:

for the PoC testing of maximum number of titles that can be exported, the following data used:

Vagrant box - 'testing'

holdings credentials - sandbox


Export findings

number of exported titles

 response time 

size

notes/example files

< 20 000< 22s~1.7 Mb


56 50851s 15 ms3.71 Mb
71 4821 m 14s7.93 Mb
184 9632 m 50s16.74 Mb
200 3363m 02s17.7 Mb
300 000--


The OutOfMemoryError occurs for the export titles if their range between 200 000 - 300 000 entities. The current setup for the sandbox KB Credentials does not have such amount.

The attached heap dump is generated for the title with 251217 entities. Issue, that represents described edge case is created - MODKBEKBJ-531 - Getting issue details... STATUS


Created issue related to the timeout issue on Rancher - FOLIO-2891 - Getting issue details... STATUS