Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In Progress(in review) + retesting results will be add to these report in scope of the
Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-681

...

  1. One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-748
    Reproduces in both cases with and without splitting feature enabled in at least 30% of test runs with 500k record files and multitenant testing.
  2. During the new Data Import splitting feature testing, items for update were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for  'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-930
  3. UI issue, when canceled or completed with error Job progress bar cannot be deleted from the screen.
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-929
  4. Usage:
    • Should not use less than 1000 for RECORDS_PER_SPLIT_FILE. The system is stable enough to ingest 1000 records consistently and smaller amounts will incur more overheads, resulting in longer jobs' durations.  CPU utilization for mod-di-converter-storage for 500 RECORDS_PER_SPLIT_FILE(RPSF) = 160%, for 1000RPSF =180%, for 5K RPSF =380% and for 10K RPSF =433%, so in the case of selecting configurations 5K or 10K we recommend to add more CPU to mod-di-converter-storage service.
    • When toggling the file-splitting feature, mod-source-record-storage, mod-source-record-manager's tasks need to be restarted.
    • Keep in mind about the Kafka broker's disk size (as bigger jobs - up to 500K - can be run now), consecutive jobs may use up the disk quickly because the messages' retention time currently is set at 8 hours. For example with 300GB disk size, consecutive jobs of 250K, 500K, 500K sizes will exhaust the disk. 
  5. More CPU could be allocated to mod-inventory and mod-di-converter-storage

...

 ** -  up to 10 items were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for  'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODDATAIMP-930

...

With CI/CO 20 users and DI 25k records on each of the 3 tenants Splitting Feature Disabled

ocp3-mod-data-import:12

Image Modified

Data Import Robustness Enhancement 
Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-646

...

Memory utilization rich maximal value for mod-source-record-storage-b 88%  and for mod-source-record-manager-b 85%.

Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.

...

Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.

CPU utilization of 
mod-di-converter-storage-b

 

RDS CPU Utilization 

Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test. Maximal  CPU Utilization = 95%

...

Expand
titleTask definition

{
    "taskDefinitionArn": "arn:aws:ecs:us-east-1:054267740449:task-definition/ocp3-mod-data-import:36",
    "containerDefinitions": [
        {
            "name": "mod-data-import",
            "image": "579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-data-import:3.0.3",
            "cpu": 256,
            "memory": 2048,
            "memoryReservation": 1844,
            "portMappings": [
                {
                    "containerPort": 8081,
                    "hostPort": 0,
                    "protocol": "tcp"
                }
            ],
            "essential": true,
            "environment": [
                {
                    "name": "DB_MAXPOOLSIZE",
                    "value": "20"
                },
                {
                    "name": "CONFIG_FILE",
                    "value": "config.json"
                },
                {
                    "name": "SCORE_AGE_NEWEST",
                    "value": "0"
                },
                {
                    "name": "DB_PORT",
                    "value": "5432"
                },
                {
                    "name": "AWS_URL",
                    "value": "https://s3.amazonaws.com"
                },
                {
                    "name": "SCORE_TENANT_USAGE_MAX",
                    "value": "-200"
                },
                {
                    "name": "ASYNC_PROCESSOR_POLL_INTERVAL_MS",
                    "value": "5000"
                },
                {
                    "name": "JAVA_ARGS",
                    "value": "-Dhttp.port=8082 -Dlog.level=info"
                },
                {
                    "name": "JAVA_OPTS",
                    "value": "-Dvertx.logger-delegate-factory-class-name=io.vertx.core.logging.SLF4JLogDelegateFactory -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/ms/mod-data-import.hprof -XX:OnOutOfMemoryError=/usr/ms/heapdump.sh -XX:MetaspaceSize=384m -XX:MaxMetaspaceSize=512m -Xmx1292m"
                },
                {
                    "name": "AWS_BUCKET",
                    "value": "data-import-folio-eis-us-east-1-int-tenant"
                },
                {
                    "name": "ENV",
                    "value": "ocp3"
                },
                {
                    "name": "SCORE_AGE_OLDEST",
                    "value": "50"
                },
                {
                    "name": "AWS_SDK",
                    "value": "true"
                },
                {
                    "name": "JAVA_PROFILER_OPTS",
                    "value": "-noverify -javaagent:\"/usr/ms/jvm-profiler-1.0.0.jar\"=configProvider=com.uber.profiling.YamlConfigProvider,configFile=\"/usr/ms/profiler.yaml\" -cp \"/usr/ms/jvm-profiler-1.0.0.jar\" "
                },
                {
                    "name": "SCORE_AGE_EXTREME_THRESHOLD_MINUTES",
                    "value": "480"
                },
                {
                    "name": "ASYNC_PROCESSOR_MAX_WORKERS_COUNT",
                    "value": "1"
                },
                {
                    "name": "SCORE_TENANT_USAGE_MIN",
                    "value": "100"
                },
                {
                    "name": "SCORE_PART_NUMBER_FIRST",
                    "value": "1"
                },
                {
                    "name": "SPLIT_FILES_ENABLED",
                    "value": "true"
                },
                {
                    "name": "SCORE_JOB_SMALLEST",
                    "value": "40"
                },
                {
                    "name": "file.processing.edifact.buffer.chunk.size",
                    "value": "10"
                },
                {
                    "name": "SCORE_PART_NUMBER_LAST_REFERENCE",
                    "value": "100"
                },
                {
                    "name": "S3_FORCEPATHSTYLE",
                    "value": "true"
                },
                {
                    "name": "JAVA_PROFILER_STATE",
                    "value": "disabled"
                },
                {
                    "name": "AWS_REGION",
                    "value": "us-east-1"
                },
                {
                    "name": "SCORE_JOB_REFERENCE",
                    "value": "100000"
                },
                {
                    "name": "SCORE_AGE_EXTREME_VALUE",
                    "value": "10000"
                },
                {
                    "name": "DB_HOST",
                    "value": "db.ocp3.folio-eis.us-east-1"
                },
                {
                    "name": "SCORE_JOB_LARGEST",
                    "value": "-40"
                },
                {
                    "name": "MAX_REQUEST_SIZE",
                    "value": "4000000"
                },
                {
                    "name": "KAFKA_PORT",
                    "value": "9092"
                },
                {
                    "name": "KAFKA_HOST",
                    "value": "kafka.ocp3.folio-eis.us-east-1"
                },
                {
                    "name": "SYSTEM_PROCESSING_PASSWORD",
                    "value": "ditf-file-splitting"
                },
                {
                    "name": "LOG4J_CONFIGURATION_FILE",
                    "value": "https://s3.amazonaws.com/ocp3-folio-eis-us-east-1-int/log/log4j2.properties"
                },
                {
                    "name": "PREFIX",
                    "value": "ocp3"
                },
                {
                    "name": "RECORDS_PER_SPLIT_FILE",
                    "value": "1000"
                },
                {
                    "name": "SCORE_PART_NUMBER_LAST",
                    "value": "0"
                },
                {
                    "name": "DB_DATABASE",
                    "value": "folio"
                },
                {
                    "name": "DB_EXPLAIN_QUERY_THRESHOLD",
                    "value": "300000"
                }
            ],
            "mountPoints": [],
            "volumesFrom": [],
            "secrets": [
                {
                    "name": "DB_USERNAME",
                    "valueFrom": "arn:aws:ssm:us-east-1:054267740449:parameter/fse/cluster/ocp3/dbClusterMaster_userName"
                },
                {
                    "name": "DB_PASSWORD",
                    "valueFrom": "arn:aws:ssm:us-east-1:054267740449:parameter/fse/cluster/ocp3/dbClusterMaster_userPassword"
                }
            ],
            "stopTimeout": 120,
            "ulimits": [
                {
                    "name": "nofile",
                    "softLimit": 1048576,
                    "hardLimit": 1048576
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "ocp3-folio-eis",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ocp3"
                }
            }
        }
    ],
    "family": "ocp3-mod-data-import",
    "taskRoleArn": "arn:aws:iam::054267740449:role/Role-folio-ecs-task",
    "executionRoleArn": "arn:aws:iam::054267740449:role/Role-folio-ecs-task",
    "revision": 36,
    "volumes": [],
    "status": "ACTIVE",
    "requiresAttributes": [
        {
            "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
        },
        {
            "name": "ecs.capability.execution-role-awslogs"
        },
        {
            "name": "com.amazonaws.ecs.capability.ecr-auth"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.21"
        },
        {
            "name": "com.amazonaws.ecs.capability.task-iam-role"
        },
        {
            "name": "ecs.capability.container-ordering"
        },
        {
            "name": "ecs.capability.execution-role-ecr-pull"
        },
        {
            "name": "ecs.capability.secrets.ssm.environment-variables"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
        }
    ],
    "placementConstraints": [],
    "compatibilities": [
        "EXTERNAL",
        "EC2"
    ],
    "registeredAt": "2023-11-06T07:08:13.918Z",
    "registeredBy": "arn:aws:sts::054267740449:assumed-role/AWSReservedSSO_FOLIOFSE_ead3c38ca817a601/mpetryshyn@ebsco.com",
    "tags": []
}


Test 1.  Single tenant: create and update 250K file 

Test #Test parametersProfileDurationStatus

Previous results  (Orchid )

Duration

1.1250K MARC BIB Create PTF - Create 22 hours 16 min Completed1 hour 32 min
1.2250K MARC BIB UpdatePTF - Updates Success - 13 hours 1 min Completed2 hours 16 min
1.3Multitenant MARC Create (100k, 50k, and 1 record)PTF - Create 24 hours 14min Completed2 hours 40 min

Test 1.4 With CI/CO 20 users and DI 25k records on each of the 3 tenants 

Splitting Feature enabled

Release: Orchid

Response time without DI (Average) 


Release: Orchid
Response time with DI
(Average)


Release: Poppy
Response time without DI (Average) 

Release: Poppy
Response time with DI (Average) 

Check-Out0.804s1.48s1.03s2.26s
Check-In0.505s1.067s0.570s1.4s



Release: Orchid

DI Duration with CI/CO 

Release: Poppy

DI Duration with CI/CO 

Tenant _116 min 53 sec34 min 55 sec
Tenant _220min 39 sec27 min 39 sec
Tenant _317min 54 sec25 min 17 sec

...

Resource utilization during testing

Test 1.1. Data-import of 250K records file with "PTF - Create 2" job profile

Service CPU Utilization 

Memory Utilization


RDS CPU Utilization  

RDS Database Connections
Image Modified

Appendix

Infrastructure ocp3  with the "Bugfest" Dataset

...

  • tenant0_mod_source_record_storage.marc_records_lb = 9674629
  • tenant2_mod_source_record_storage.marc_records_lb = 0
  • tenant3_mod_source_record_storage.marc_records_lb = 0
  • tenant0_mod_source_record_storage.raw_records_lb = 9604805
  • tenant2_mod_source_record_storage.raw_records_lb = 0
  • tenant3_mod_source_record_storage.raw_records_lb = 0
  • tenant0_mod_source_record_storage.records_lb = 9674677
  • tenant2_mod_source_record_storage.records_lb = 0
  • tenant3_mod_source_record_storage.records_lb = 0
  • tenant0_mod_source_record_storage.marc_indexers =  620042011
  • tenant2_mod_source_record_storage.marc_indexers =  0
  • tenant3_mod_source_record_storage.marc_indexers =  0
  • tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
  • tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
  • tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
  • tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
  • tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
  • tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
  • tenant0_mod_inventory_storage.authority = 4
  • tenant2_mod_inventory_storage.authority = 0
  • tenant3_mod_inventory_storage.authority = 0
  • tenant0_mod_inventory_storage.holdings_record = 9592559
  • tenant2_mod_inventory_storage.holdings_record = 16
  • tenant3_mod_inventory_storage.holdings_record = 16
  • tenant0_mod_inventory_storage.instance = 9976519
  • tenant2_mod_inventory_storage.instance = 32
  • tenant3_mod_inventory_storage.instance = 32 
  • tenant0_mod_inventory_storage.item = 10787893
  • tenant2_mod_inventory_storage.item = 19
  • tenant3_mod_inventory_storage.item = 19

PTF -environment ocp3 

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, one reader, and one writer

    NameAPI NameMemory GIBvCPUsmax_connections
    R6G Extra Largedb.r6g.xlarge32 GiB4 vCPUs2731


  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
  • Kafka topics partitioning: - 2 partitions for DI topics

...