Data Import with file splitting feature + Check-ins Check-outs (Poppy)

Data Import with file splitting feature + Check-ins Check-outs (Poppy)

Overview

This document contains the results of testing Check-in/Check-out and Data Import with file splitting feature for MARC Bibliographic records in the Poppy release.

Ticket: https://folio-org.atlassian.net/browse/PERF-756

 

Summary

  • There is significant improvement in data import performance in Poppy using the file splitting feature compared to Orchid (40% for DI Create, 25% for DI Update). However, there is a small degradation (up to 5%) compared to Poppy without the file splitting feature when running with CICO. CO response times are almost identical to Poppy without the file splitting feature. The CI response time is 20% slower with and without Data Import.

  • Average CPU utilization did not exceed 150% for all the modules. The highest consumption was observed from mod-inventory. It was growing from 110% up to 250% at the end of the test (So as memory grows too, we can suspect the issue https://folio-org.atlassian.net/browse/MODINV-944. It is fixed in version 20.1.9 but this test was run on version 20.1.7 of mod-inventory). Spikes of mod-data-import were observed in Data Import jobs with 50k files up to 130%. for jobs and a 250% spike for 100k. For Data Import jobs CPU utilization didn't exceed 110% for all other modules

  • Memory utilization increase is a result of previous modules restarting (everyday cluster shutdown process). Memory consumption before tests for mod-search was 45% and for mod-inventory - 55%. During the test with 100k files mod-search memory utilization increases to 90% and mod-inventory up to 100%.

  • Average DB CPU usage during data import is about 95% which is consistent with the performance observed during the same tests in Orchid.

  • The average connection count during data import is approximately 600 connections for Create jobs, which is twice as high as when the file splitting feature is disabled. For Update jobs, the connection count is 560.

Test Runs 

Test #

Scenario

Load level

Test #

Scenario

Load level

1

DI MARC Bib Create

5K, 10K, 25K, 50K, 100K consequentially (with 5 min pause)

CICO

 8 users

2

DI MARC Bib Update

5K, 10K, 25K, 50K, 100K consequentially (with 5 min pause)

CICO

 8 users

Test Results

Data import

Total time for all Data Export jobs - 1 hour 16 minutes 47 seconds.

 

Profile

 

MARC File


DI Duration

Poppy with file splitting feature (hh:mm:ss)

Check In, Check Out
Response time (8 users)

Poppy

CI Average, sec

CO Average, sec

 

 

DI MARC Bib Create (PTF - Create 2)

5K.mrc

00:02:47

1.111

1.432

10K.mrc

00:05:26

1.261

1.556

25K.mrc

00:14:31

1.441

1.532

50K.mrc

00:24:13

1.432

1.478

100K.mrc

00:49:35

1.358

1.621

 

 

DI MARC Bib Update (PTF - Updates Success - 1)

5K.mrc

00:03:39

0.870

1.201

10K.mrc

00:06:46

0.885

1.216

25K.mrc

00:17:04

0.949

1.266

50K.mrc

00:34:23

1.083

1.264

100K.mrc

01:14:30

1.024

1.383

Check-in/Check-out without DI

 

Scenario

 

Load level

 

Request

Response time, sec
Poppy with file splitting feature

95 perc

average

Circulation Check-in/Check-out (without Data Import)

8 users

Check-in

0.724

0.610

Check-out

0.999

0.872

Comparison

CICO with DI comparison

Profile

MARC File

DI Duration

Deviation, %

Check In, Check Out Response time (8 users)

Check In, Check Out Response time (8 users)

Delta, %

without CI/CO

with CI/CO

Poppy with file splitting feature

Orchid

Poppy

Poppy with file splitting feature

Poppy/Poppy with file splitting feature

Poppy/Poppy with file splitting feature

Orchid*

Poppy

Poppy with file splitting feature

Orchid*

Poppy

Poppy with file splitting feature

 compared DI without CICO and with CICO

Di with CICO compared to without splitting feature

CI Average sec

CO Average sec

CI Average sec

CO Average sec

CI Average sec

CO Average sec

CI

CO

DI MARC Bib Create (PTF - Create 2)

5K.mrc

00:04:30

00:02:39

00:02:26

00:05:01

00:02:53

00:02:47

+ 00:00:21

 - 00:00:06

0.961

1.442

0.901

1.375

1.111

1.432

18.90%

3.98%

10K.mrc

00:09:25

00:05:00

00:04:56

00:09:06

00:04:32

00:05:26

+ 00:00:30

+ 00:00:46

1.058

1.624

0.902

1.47

1.261

1.556

28.47%

5.53%

25K.mrc

00:22:16

00:11:15

00:12:14

00:24:28

00:11:14

00:14:31

+ 00:02:16

+ 00:03:17

1.056

1.621

1

1.571

1.441

1.532

30.60%

-2.55%

50K.mrc

00:39:27

00:22:16

00:22:49

00:43:03

00:21:55

00:24:13

+ 00:01:24

+ 00:02:18

0.936

1.519

0.981

1.46

1.432

1.478

31.49%

1.22%

100K.mrc

01:38:00

00:49:58

00:47:52

01:35:50

00:47:02

00:49:35

+ 00:01:47

+ 00:02:33

0.868

1.468

1.018

1.491

1.358

1.621

25.04%

8.02%

DI MARC Bib Update (PTF - Updates Success - 1)

5K.mrc

00:04:02

00:02:28

00:03:17

00:04:52

00:03:19

00:03:39

+ 00:00:22

+ 00:00:20

0.855

1.339

0.755

1.169

0.870

1.201

13.22%

2.66%

10K.mrc

00:08:10

00:05:31

00:06:32

00:09:22

00:06:20

00:06:46

+ 00:00:14

+ 00:00:26

0.916

1.398

0.75

1.307

0.885

1.216

15.25%

-7.48%

25K.mrc

00:19:39

00:14:50

00:16:05

00:24:02

00:14:04

00:17:04

+ 00:00:59

+ 00:03:00

0.922

1.425

0.822

1.403

0.949

1.266

13.38%

-10.82%

50K.mrc

00:38:30

00:32:53

00:32:43

00:47:13

00:29:59

00:34:23

+ 00:01:40

+ 00:04:24

0.904

1.456

0.893

1.424

1.083

1.264

17.54%

-12.66%

100K.mrc

01:33:00

01:14:39

01:10:04

01:40:25

01:03:03

01:14:30

+ 00:04:26

+ 00:11:27

0.838

1.415

0.908

1.51

1.024

1.383

11.33%

-9.18%

* Orchid and Poppy DI and CICO results are taken from Data Import with Check-ins Check-outs (Poppy).

Detailed CICO response time comparison

 

Scenario

 

Load level

 

Request

Response time, sec
Orchid

Response time, sec
Poppy

Response time, sec
Poppy with file splitting feature

95 perc

average

95 perc

average

95 perc

average