Marc Authorities update [PTF-report]

It's been found after testing that the actual durations of the imports performed were about 2 (two) times longer than what was reported. This is due to the PTF environment missing a DB trigger that, when restored, doubled the imports' durations.

Overview


For subsequent DI marc Authorities update through the whole DB script was created Marc Authorities update instructions#ConfigurationFile. It looks like script is not performant.

In scope of PERF-456 it's needed to run tests to answer a questions: 

  • Determine how to improve this script/process so that 50,000 MARC authority records can be reliably updated in ~30 minutes
  • Determine what is needed to support updating 100,000 record reliably and how long the job will take to complete 

Scenario: 

  • Without background activities
  • With CICO on the background plus (if possible another DI to create marc BIB)

Summary

  • On PTF environments we have a lots of corrupted data (SRS records that has no corresponding records in mod_inventory_storage.authority table)
  • To solve this Shans Kaluhin did rewrite script to use data export for ID's that was extracted from inventory-storage to generate valid .mrc file.
  • ±100 000 records can be imported in less than 30 minutes (to be more accurate in 27-30 minutes) with using this kind of Infrastructure
  • Import limit and inventory limit was set for 100 000 for all tests.
  • For data base containing 6.6M records whole update took approximately 15 hours.
  • Possible memory leak detected on mod-inventory-storage (memory usage grow from 27% to 62% during first test. And from 62% to 95% during second test. )
  • DB size (mod_inventory_storage.authority) 6664205 records. According to data import it did update 2688643 records. Which is 40%!!!

Recommendations & Jiras (Optional)

Jiras:

sql to update field name
UPDATE [tenant]_mod_inventory_storage.authority
SET jsonb = replace(jsonb::TEXT,'GeographicTerm"','GeographicName"')::jsonb;

)


Test Runs 

Test #

Test Conditions

Duration 

Notes

1.

Script run on whole DB15 hours mins

completed successfully,

27-30 min per 100K records

records updated according to DI page 2688643 (40% of all)

2.

8 users CI/CO + DI 5k MARC BIB Create+ 

script run during 1.5 hours

30 mins

completed successfully,

no errors on CICO, DI marc BIB or authorities update 

3.Script run on whole DB13 hr 40 minone of a jobs stuck* 

*Job stuck due to :

08:26:52 [] [] [] [] ERROR teModifyEventHandler Error while MARC record modifying

io.vertx.pgclient.PgException: ERROR: duplicate key value violates unique constraint "idx_records_matched_id_gen" (23505)

07:03:33 [] [] [] [] ERROR KafkaConsumerWrapper Error while processing a record - id: 11 subscriptionPattern: SubscriptionDefinition(eventType=DI_MARC_FOR_UPDATE_RECEIVED, subscriptionPattern=ncp3\.Default\.\w{1,}\.DI_MARC_FOR_UPDATE_RECEIVED) offset: 6586938

java.util.concurrent.CompletionException: org.folio.services.exceptions.CacheLoadingException: Error loading jobProfileSnapshot by id: 'f942a046-8aa9-403f-bcf2-511cc39d64f1', status code: 500, response message: proxyClient failure: mod-data-import-converter-storage-1.15.2 http://mod-data-import-cs-b.ncp3.folio-eis.us-east-1:8051/mod-data-import-cs: Failed to resolve 'mod-data-import-cs-b.ncp3.folio-eis.us-east-1' and search domain query for configured domains failed as well: [ec2.internal, VpcA.us-east-1.eis-FolioIntegration.cloud]: GET /mod-data-import-cs/data-import-profiles/jobProfileSnapshots/f942a046-8aa9-403f-bcf2-511cc39d64f1

Both of this errors on mod-source-record-storage.

Results

Response Times (Average of all tests listed above, in seconds)

Test#DurationNotes
115 hr updated 2688643 records
227 minupdated 100K records (test was only to update 100K records)
313 hr2687219

Memory Utilization

All of a modules behave stable except mod-inventory-storage 

mod-inventory-storage memory usage grow from 27% to 62% during first test. And from 62% to 95% during second test. 


CPU Utilization 

*On chart below - each little spike corresponds to each DI job performed. 

**Some of spikes is shorter than the others - because of differences in number of records imported.

**Test #2 has higher CPU usage because it has background activities (CICO+ additional DI )


Most CPU consuming modules: 

  • mod-source-record-storage - 37-40%;
  • mod-inventory - 32%-40%;
  • mod-source-record-manager  26%-29%
  • others - usage less then 20%


Instance level CPU usage

RDS CPU Utilization 


Predictable that each DI job is consuming lot of DB CPU (each spike here corresponds to each DI job).

Approximately DB CPU usage is ± 60%

Kafka metrics

Appendix

Infrastructure

PTF -environment ncp3 

  • m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 instances of db.r6.xlarge database instances, one reader, and one writer
  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
  • Kafka topics partitioning: 
    • DI_RAW_RECORDS_CHUNK_READ -2 
    • DI_RAW_RECORDS_CHUNK_PARSED -2
    • DI_PARSED_RECORDS_CHUNK_SAVED -2
    • DI_SRS_MARC_AUTHORITY_RECORD_CREATED -1
    • DI_COMPLETED -2


Modules memory and CPU parameters

Modules

Version

Task Definition

Running Tasks 

CPU

Memory

MemoryReservation

MaxMetaspaceSize

Xmx

mod-data-import2.6.241256204818445121292
mod-data-import-converter-storage1.15.1121281024896128768
mod-source-record-storage5.5.242102415361440512908
mod-source-record-manager3.5.6421024409636885122048
mod-inventory-storage25.0.4321024220819523841440
mod-inventory19.0.2721024288025925121814


Methodology/Approach

According to Marc Authorities update instructions#ConfigurationFile

  • Populate config.json with valid data (okapi host, import limit, inventory limit)
  • Run .jar file using command java -jar ***.jar config.json