Folijet - Optimize Insert & Update of marc_records_lb (MODSOURCE-601)

Modules:

Data Import Module (mod-data-import:2.8.0-SNAPSHOT.258)

Data Import Converter Storage (mod-di-converter-storage:2.1.0-SNAPSHOT.8)

Source Record Manager Module (mod-source-record-manager:3.7.0-SNAPSHOT.759)

Source Record Storage Module (mod-source-record-storage:5.7.0-SNAPSHOT.494)

Inventory Module (mod-inventory:20.1.0-SNAPSHOT.604)

Inventory Storage Module (mod-inventory-storage:26.1.0-SNAPSHOT.826)

Configuration:

Kafka: 2 partitions for each topic and the replication factor is 2

2 items for each module.


mod-data-import: CPU:256/128, mem:2048Mi/1844Mi, -XX:MaxMetaspaceSize=512m -Xmx1292m

mod-di-converter-storage: CPU:128/128, mem:1024Mi/896Mi,  -XX:MaxMetaspaceSize=128m -Xmx768m

mod-source-record-manager: CPU:1024/512, mem:2048Mi/1844Mi, db_maxpoolsize: 15, -XX:MaxMetaspaceSize=800m -Xmx1024m

mod-source-record-storage: CPU:1024/512, mem:1536Mi/1440Mi, db_maxpoolsize: 15, -XX:MaxMetaspaceSize=512m -Xmx1024m

mod-inventory: CPU:1024/512, mem:2880Mi/2592Mi, -XX:MaxMetaspaceSize=512m -Xmx1814m

mod-inventory-storage: CPU:1024/512, mem:2208Mi/1952Mi, -XX:MaxMetaspaceSize=512m -Xmx1440m


Create ProfileUpdate Profile



MARC BIB

5k rec

10k rec50k rec100k rec

Profile:

Create

"startedDate" : "2023-05-01T14:18:08.828",

"completedDate" : "2023-05-01T14:32:59.472"

14 min

"startedDate" : "2023-05-01T15:33:55.004",

"completedDate" : "2023-05-01T16:04:15.136",

31 min

"startedDate" : "2023-05-01T17:26:36.552",

"completedDate" : "2023-05-01T18:59:33.557",

1h 33min

SRM: db_maxpoolsize = 15 ..100  - Timeout

SRM: db_maxpoolsize = 400

"startedDate" : "2023-05-03T13:47:11.526",

"completedDate" : "2023-05-03T17:05:31.741",

3h 18min

Profile:

Update

"startedDate" : "2023-05-01T14:58:59.770",

"completedDate" : "2023-05-01T15:31:51.023"

33 min

"startedDate" : "2023-05-01T16:25:12.856",

"completedDate" : "2023-05-01T17:26:05.342",

1h 1min

SRM: db_maxpoolsize = 15 ..100  - Timeout

SRM: db_maxpoolsize = 400

"startedDate" : "2023-05-03T10:18:08.980",

"completedDate" : "2023-05-03T12:43:24.420",

2h 25min

SRM: db_maxpoolsize = 400

Completed with errors:

"startedDate" : "2023-05-03T18:08:41.016",

"completedDate" : "2023-05-04T07:46:53.954",

13h 38min


MARC Authority10k50k

Profile:

Create

default profile

"startedDate" : "2023-05-04T13:02:26.364",

"completedDate" : "2023-05-04T13:08:14.482",

6 min

"startedDate" : "2023-05-04T16:35:59.006",

"completedDate" : "2023-05-04T17:01:59.963"

26 min

Profile:

Update

"startedDate" : "2023-05-04T13:09:16.615",

"completedDate" : "2023-05-04T13:16:10.078",

7 min


Create Index:

Size: marc_records_lb = 9`874`818

USED RESOURCES:

mod-source-record-storage (2 days)mod-inventory (2 days)

mod-inventory-storage (2 days)




ERRORS:

Infrastructure issue (Fixed):

2023-05-02 09:24:12.668 [vert.x-worker-thread-2] [] [] [] [] ERROR KafkaConsumerWrapper businessHandlerCompletionHandler:: Consumer - id: 34 subscriptionPattern: SubscriptionDefinition(eventType=DI_COMPLETED, subscriptionPattern=folio-perf-folijet\.Default\.\w{1,}\.DI_COMPLETED) Error while commit offset: 123274

org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets {folio-perf-folijet.Default.fs09000000.DI_COMPLETED-1=OffsetAndMetadata{offset=123274, leaderEpoch=null, metadata=''}} 

org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.

org.apache.kafka.common.errors.RebalanceInProgressException: Offset commit cannot be completed since the consumer is undergoing a rebalance for auto partition assignment. You can try completing the rebalance by calling poll() and then retry the operation.

===================

2023-05-02 11:33:01.144 [vert.x-worker-thread-1] [] [] [] [] WARN tJournalKafkaHandler processDeduplicationFailure:: Error with database during collecting of deduplication info for handlerId: ca0c6c56-e74e-4921-b4c9-7b2de53c43ec , eventId: 5be28e4b-323f-4a87-9495-8c011dd82f60 

io.vertx.core.impl.NoStackTraceThrowable: Timeout 

2023-05-02 11:33:01.144 [vert.x-worker-thread-1] [] [] [] [] DEBUG KafkaConsumerWrapper businessHandlerCompletionHandler:: Consumer - id: 16 subscriptionPattern: SubscriptionDefinition(eventType=DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING, subscriptionPattern=folio-perf-folijet\.Default\.\w{1,}\.DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING) Committing offset: 33227 

2023-05-02 11:33:01.144 [vert.x-worker-thread-1] [] [] [] [] ERROR KafkaConsumerWrapper businessHandlerCompletionHandler:: Error while processing a record - id: 16 subscriptionPattern: SubscriptionDefinition(eventType=DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING, subscriptionPattern=folio-perf-folijet\.Default\.\w{1,}\.DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING) offset: 33227 

io.vertx.core.impl.NoStackTraceThrowable: Timeout 

=====================

May 2, 2023 @ 21:42:34.569 2023-05-02 16:42:34.569 [vert.x-eventloop-thread-1] [051411/metadata-provider] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_source_record_manager] WARN PostgresClient EXPLAIN ANALYZE SELECT * FROM get_job_log_entries('31442115-0cd9-4e51-8ad2-586d11955b9b', 'source_record_order', 'ASC', 100, 0, false, 'ALL')

May 2, 2023 @ 21:42:34.569 Function Scan on get_job_log_entries (cost=0.25..10.25 rows=1000 width=476) (actual time=2815.627..2815.635 rows=100 loops=1)

May 2, 2023 @ 21:42:34.569 Planning Time: 0.036 ms

May 2, 2023 @ 21:42:34.569 Execution Time: 2815.655 ms 

=====================

2023-05-02 15:36:47.786 [vert.x-worker-thread-18] [] [] [] [] DEBUG Conn execute timer: UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning * took 568 ms 

2023-05-02 14:32:13.525 [vert.x-worker-thread-10] [] [] [] [] DEBUG Conn execute timer: UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning * took 2738 ms 

=====================

2023-05-02 14:32:13.525 [vert.x-worker-thread-10] [] [] [] [] DEBUG Conn execute timer: UPDATE fs09000000_mod_source_record_manager.flow_control_events_counter SET events_to_process = $1 RETURNING events_to_process took 2739 ms 

=====================

May 2, 2023 @ 20:10:06.246 io.vertx.core.impl.NoStackTraceThrowable: Couldn't update JobExecution status, JobExecution already marked as ERROR 

=====================

May 2, 2023 @ 19:32:12.503 2023-05-02 14:32:12.497 [vert.x-eventloop-thread-1] [] [] [] [] INFO JournalRecordDaoImpl save:: Trying to save JournalRecord entity to the journal_records table 

2023-05-02 14:32:12.529 [vert.x-eventloop-thread-1] [] [] [] [] WARN JournalRecordDaoImpl save:: Error saving JournalRecord entity 

May 2, 2023 @ 19:32:12.529 io.vertx.core.impl.NoStackTraceThrowable: Timeout 

=====================

May 2, 2023 @ 19:32:13.548 2023-05-02 14:32:13.548 [vert.x-worker-thread-10] [] [] [] [] WARN tHandlingServiceImpl handle:: Failed to handle DI_COMPLETED event 
May 2, 2023 @ 19:32:13.548 io.vertx.core.impl.NoStackTraceThrowable: Timeout 

May 2, 2023 @ 19:32:13.649 2023-05-02 14:32:13.645 [vert.x-eventloop-thread-1] [741919/mapping-metadata] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_source_record_manager] ERROR PostgresClient Opening SQLConnection failed: Timeout 

=====================

100k update:



io.vertx.core.impl.NoStackTraceThrowable: Current retry number 1 exceeded or equal given number 1 for the Instance update for jobExecutionId 'af3193da-a3d4-44a7-bfb6-9fba1cec0210' (3 items)

io.vertx.core.impl.NoStackTraceThrowable: Current retry number 1 exceeded or equal given number 1 for the Item update for jobExecutionId 'af3193da-a3d4-44a7-bfb6-9fba1cec0210'  (41 items)

org.folio.inventory.dataimport.exceptions.OptimisticLockingException: Current retry number 1 exceeded given number 1 for the Instance update (1 items)



WARN:

java.lang.IllegalArgumentException: 'DI_INVENTORY_INSTANCE_UPDATED_READY_FOR_POST_PROCESSING' has no corresponding value. Accepted values: [DI_SRS_MARC_BIB_RECORD_UPDATED, DI_SRS_MARC_AUTHORITY_RECORD_UPDATED, DI_SRS_MARC_HOLDINGS_RECORD_UPDATED, DI_SRS_MARC_BIB_RECORD_MODIFIED, DI_SRS_MARC_BIB_RECORD_MATCHED, DI_SRS_MARC_AUTHORITY_RECORD_MATCHED, DI_SRS_MARC_HOLDINGS_RECORD_MATCHED, DI_SRS_MARC_BIB_RECORD_NOT_MATCHED, DI_SRS_MARC_AUTHORITY_RECORD_NOT_MATCHED, DI_SRS_MARC_HOLDINGS_RECORD_NOT_MATCHED, DI_INVENTORY_INSTANCE_CREATED, DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING, DI_INVENTORY_INSTANCE_CREATED_READY_FOR_POST_PROCESSING, DI_INVENTORY_INSTANCE_UPDATED, DI_INVENTORY_INSTANCE_NOT_MATCHED, DI_INVENTORY_HOLDING_CREATED, DI_INVENTORY_HOLDING_UPDATED, DI_INVENTORY_HOLDING_NOT_MATCHED, DI_INVENTORY_AUTHORITY_CREATED, DI_INVENTORY_AUTHORITY_UPDATED, DI_INVENTORY_AUTHORITY_NOT_MATCHED, DI_INVENTORY_ITEM_CREATED, DI_INVENTORY_ITEM_UPDATED, DI_INVENTORY_ITEM_NOT_MATCHED, DI_SRS_MARC_AUTHORITY_RECORD_CREATED, DI_LOG_SRS_MARC_BIB_RECORD_CREATED, DI_LOG_SRS_MARC_BIB_RECORD_UPDATED, DI_SRS_MARC_HOLDING_RECORD_CREATED, DI_LOG_SRS_MARC_AUTHORITY_RECORD_CREATED, DI_ORDER_CREATED, DI_ORDER_CREATED_READY_FOR_POST_PROCESSING, DI_COMPLETED, DI_ERROR]