Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

...


Info

It's been found after testing that the actual durations of the imports performed were about 2 (two) times longer than what was reported. This is due to the PTF environment missing a DB trigger that, when restored, doubled the imports' durations.

Table of Contents
outlinetrue

...

For subsequent DI marc Authorities update through the whole DB script was created Marc Authorities update instructions#ConfigurationFile. It looks like script is not performant.

In scope of PERF-456 it's needed to run tests to answer a questions: 

...

  • On PTF environments we have a lots of corrupted data (SRS records that has no corresponding records in mod_inventory_storage.authority table)
  • To solve this Shans Kaluhin did rewrite script to use data export for ID's that was extracted from inventory-storage to generate valid .mrc file.
  • ±100 000 records can be imported in less than 30 minutes (to be more accurate in 27-30 minutes) with using this kind of Infrastructure
  • Import limit and inventory limit was set for 100 000 for all tests.
  • For data base containing 6.6M records whole update took approximately 15 hours.
  • Possible memory leak detected on mod-inventory-storage (memoryusage grow from 27% to 62% during first test. And from 62% to 95% during second test. )
  • DB size (mod_inventory_storage.authority) 6664205 records. According to data import it did update 2688643 records. Which is 40%!!!

...

...

Approximately DB CPU usage is ± 60%

Kafka metrics

Image Added

Appendix

Infrastructure

PTF -environment ncp3 

  • m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 instances of db.r6.xlarge database instances, one reader, and one writer
  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
  • Kafka topics partitioning: 
    • DI_RAW_RECORDS_CHUNK_READ -2 
    • DI_RAW_RECORDS_CHUNK_PARSED -2
    • DI_PARSED_RECORDS_CHUNK_SAVED -2
    • DI_SRS_MARC_AUTHORITY_RECORD_CREATED -1
    • DI_COMPLETED -2

...