Folijet - Optimize Insert & Update of marc_records_lb (MODSOURCE-601)
Modules:
Data Import Module (mod-data-import:2.8.0-SNAPSHOT.258)
Data Import Converter Storage (mod-di-converter-storage:2.1.0-SNAPSHOT.8)
Source Record Manager Module (mod-source-record-manager:3.7.0-SNAPSHOT.759)
Source Record Storage Module (mod-source-record-storage:5.7.0-SNAPSHOT.494)
Inventory Module (mod-inventory:20.1.0-SNAPSHOT.604)
Inventory Storage Module (mod-inventory-storage:26.1.0-SNAPSHOT.826)
Configuration:
Kafka: 2 partitions for each topic and the replication factor is 2
2 items for each module.
mod-data-import: CPU:256/128, mem:2048Mi/1844Mi, -XX:MaxMetaspaceSize=512m -Xmx1292m
mod-di-converter-storage: CPU:128/128, mem:1024Mi/896Mi, -XX:MaxMetaspaceSize=128m -Xmx768m
mod-source-record-manager: CPU:1024/512, mem:2048Mi/1844Mi, db_maxpoolsize: 15, -XX:MaxMetaspaceSize=800m -Xmx1024m
mod-source-record-storage: CPU:1024/512, mem:1536Mi/1440Mi, db_maxpoolsize: 15, -XX:MaxMetaspaceSize=512m -Xmx1024m
mod-inventory: CPU:1024/512, mem:2880Mi/2592Mi, -XX:MaxMetaspaceSize=512m -Xmx1814m
mod-inventory-storage: CPU:1024/512, mem:2208Mi/1952Mi, -XX:MaxMetaspaceSize=512m -Xmx1440m
Create Profile | Update Profile |
---|---|
MARC BIB | 5k rec | 10k rec | 50k rec | 100k rec |
---|---|---|---|---|
Profile: Create | "startedDate" : "2023-05-01T14:18:08.828", "completedDate" : "2023-05-01T14:32:59.472" 14 min | "startedDate" : "2023-05-01T15:33:55.004", "completedDate" : "2023-05-01T16:04:15.136", 31 min | "startedDate" : "2023-05-01T17:26:36.552", "completedDate" : "2023-05-01T18:59:33.557", 1h 33min | SRM: db_maxpoolsize = 15 ..100 - Timeout SRM: db_maxpoolsize = 400 "startedDate" : "2023-05-03T13:47:11.526", "completedDate" : "2023-05-03T17:05:31.741", 3h 18min |
Profile: Update | "startedDate" : "2023-05-01T14:58:59.770", "completedDate" : "2023-05-01T15:31:51.023" 33 min | "startedDate" : "2023-05-01T16:25:12.856", "completedDate" : "2023-05-01T17:26:05.342", 1h 1min | SRM: db_maxpoolsize = 15 ..100 - Timeout SRM: db_maxpoolsize = 400 "startedDate" : "2023-05-03T10:18:08.980", "completedDate" : "2023-05-03T12:43:24.420", 2h 25min | SRM: db_maxpoolsize = 400 Completed with errors: "startedDate" : "2023-05-03T18:08:41.016", "completedDate" : "2023-05-04T07:46:53.954", 13h 38min |
MARC Authority | 10k | 50k |
---|---|---|
Profile: Create default profile | "startedDate" : "2023-05-04T13:02:26.364", "completedDate" : "2023-05-04T13:08:14.482", 6 min | "startedDate" : "2023-05-04T16:35:59.006", "completedDate" : "2023-05-04T17:01:59.963" 26 min |
Profile: Update | "startedDate" : "2023-05-04T13:09:16.615", "completedDate" : "2023-05-04T13:16:10.078", 7 min |
Create Index:
Size: marc_records_lb = 9`874`818
USED RESOURCES:
mod-source-record-storage (2 days) | mod-inventory (2 days) |
---|---|
mod-inventory-storage (2 days) | |
ERRORS:
Infrastructure issue (Fixed):
2023-05-02 09:24:12.668 [vert.x-worker-thread-2] [] [] [] [] ERROR KafkaConsumerWrapper businessHandlerCompletionHandler:: Consumer - id: 34 subscriptionPattern: SubscriptionDefinition(eventType=DI_COMPLETED, subscriptionPattern=folio-perf-folijet\.Default\.\w{1,}\.DI_COMPLETED) Error while commit offset: 123274
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets {folio-perf-folijet.Default.fs09000000.DI_COMPLETED-1=OffsetAndMetadata{offset=123274, leaderEpoch=null, metadata=''}}
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
org.apache.kafka.common.errors.RebalanceInProgressException: Offset commit cannot be completed since the consumer is undergoing a rebalance for auto partition assignment. You can try completing the rebalance by calling poll() and then retry the operation.
===================
2023-05-02 11:33:01.144 [vert.x-worker-thread-1] [] [] [] [] WARN tJournalKafkaHandler processDeduplicationFailure:: Error with database during collecting of deduplication info for handlerId: ca0c6c56-e74e-4921-b4c9-7b2de53c43ec , eventId: 5be28e4b-323f-4a87-9495-8c011dd82f60
io.vertx.core.impl.NoStackTraceThrowable: Timeout
2023-05-02 11:33:01.144 [vert.x-worker-thread-1] [] [] [] [] DEBUG KafkaConsumerWrapper businessHandlerCompletionHandler:: Consumer - id: 16 subscriptionPattern: SubscriptionDefinition(eventType=DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING, subscriptionPattern=folio-perf-folijet\.Default\.\w{1,}\.DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING) Committing offset: 33227
2023-05-02 11:33:01.144 [vert.x-worker-thread-1] [] [] [] [] ERROR KafkaConsumerWrapper businessHandlerCompletionHandler:: Error while processing a record - id: 16 subscriptionPattern: SubscriptionDefinition(eventType=DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING, subscriptionPattern=folio-perf-folijet\.Default\.\w{1,}\.DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING) offset: 33227
io.vertx.core.impl.NoStackTraceThrowable: Timeout
=====================
May 2, 2023 @ 21:42:34.569 2023-05-02 16:42:34.569 [vert.x-eventloop-thread-1] [051411/metadata-provider] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_source_record_manager] WARN PostgresClient EXPLAIN ANALYZE SELECT * FROM get_job_log_entries('31442115-0cd9-4e51-8ad2-586d11955b9b', 'source_record_order', 'ASC', 100, 0, false, 'ALL')
May 2, 2023 @ 21:42:34.569 Function Scan on get_job_log_entries (cost=0.25..10.25 rows=1000 width=476) (actual time=2815.627..2815.635 rows=100 loops=1)
May 2, 2023 @ 21:42:34.569 Planning Time: 0.036 ms
May 2, 2023 @ 21:42:34.569 Execution Time: 2815.655 ms
=====================
2023-05-02 15:36:47.786 [vert.x-worker-thread-18] [] [] [] [] DEBUG Conn execute timer: UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning * took 568 ms
2023-05-02 14:32:13.525 [vert.x-worker-thread-10] [] [] [] [] DEBUG Conn execute timer: UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning * took 2738 ms
=====================
2023-05-02 14:32:13.525 [vert.x-worker-thread-10] [] [] [] [] DEBUG Conn execute timer: UPDATE fs09000000_mod_source_record_manager.flow_control_events_counter SET events_to_process = $1 RETURNING events_to_process took 2739 ms
=====================
May 2, 2023 @ 20:10:06.246 io.vertx.core.impl.NoStackTraceThrowable: Couldn't update JobExecution status, JobExecution already marked as ERROR
=====================
May 2, 2023 @ 19:32:12.503 2023-05-02 14:32:12.497 [vert.x-eventloop-thread-1] [] [] [] [] INFO JournalRecordDaoImpl save:: Trying to save JournalRecord entity to the journal_records table
2023-05-02 14:32:12.529 [vert.x-eventloop-thread-1] [] [] [] [] WARN JournalRecordDaoImpl save:: Error saving JournalRecord entity
May 2, 2023 @ 19:32:12.529 io.vertx.core.impl.NoStackTraceThrowable: Timeout
=====================
May 2, 2023 @ 19:32:13.548 2023-05-02 14:32:13.548 [vert.x-worker-thread-10] [] [] [] [] WARN tHandlingServiceImpl handle:: Failed to handle DI_COMPLETED event
May 2, 2023 @ 19:32:13.548 io.vertx.core.impl.NoStackTraceThrowable: Timeout
May 2, 2023 @ 19:32:13.649 2023-05-02 14:32:13.645 [vert.x-eventloop-thread-1] [741919/mapping-metadata] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_source_record_manager] ERROR PostgresClient Opening SQLConnection failed: Timeout
=====================
100k update:
io.vertx.core.impl.NoStackTraceThrowable: Current retry number 1 exceeded or equal given number 1 for the Instance update for jobExecutionId 'af3193da-a3d4-44a7-bfb6-9fba1cec0210' (3 items)
io.vertx.core.impl.NoStackTraceThrowable: Current retry number 1 exceeded or equal given number 1 for the Item update for jobExecutionId 'af3193da-a3d4-44a7-bfb6-9fba1cec0210' (41 items)
org.folio.inventory.dataimport.exceptions.OptimisticLockingException: Current retry number 1 exceeded given number 1 for the Instance update (1 items)
WARN:
java.lang.IllegalArgumentException: 'DI_INVENTORY_INSTANCE_UPDATED_READY_FOR_POST_PROCESSING' has no corresponding value. Accepted values: [DI_SRS_MARC_BIB_RECORD_UPDATED, DI_SRS_MARC_AUTHORITY_RECORD_UPDATED, DI_SRS_MARC_HOLDINGS_RECORD_UPDATED, DI_SRS_MARC_BIB_RECORD_MODIFIED, DI_SRS_MARC_BIB_RECORD_MATCHED, DI_SRS_MARC_AUTHORITY_RECORD_MATCHED, DI_SRS_MARC_HOLDINGS_RECORD_MATCHED, DI_SRS_MARC_BIB_RECORD_NOT_MATCHED, DI_SRS_MARC_AUTHORITY_RECORD_NOT_MATCHED, DI_SRS_MARC_HOLDINGS_RECORD_NOT_MATCHED, DI_INVENTORY_INSTANCE_CREATED, DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING, DI_INVENTORY_INSTANCE_CREATED_READY_FOR_POST_PROCESSING, DI_INVENTORY_INSTANCE_UPDATED, DI_INVENTORY_INSTANCE_NOT_MATCHED, DI_INVENTORY_HOLDING_CREATED, DI_INVENTORY_HOLDING_UPDATED, DI_INVENTORY_HOLDING_NOT_MATCHED, DI_INVENTORY_AUTHORITY_CREATED, DI_INVENTORY_AUTHORITY_UPDATED, DI_INVENTORY_AUTHORITY_NOT_MATCHED, DI_INVENTORY_ITEM_CREATED, DI_INVENTORY_ITEM_UPDATED, DI_INVENTORY_ITEM_NOT_MATCHED, DI_SRS_MARC_AUTHORITY_RECORD_CREATED, DI_LOG_SRS_MARC_BIB_RECORD_CREATED, DI_LOG_SRS_MARC_BIB_RECORD_UPDATED, DI_SRS_MARC_HOLDING_RECORD_CREATED, DI_LOG_SRS_MARC_AUTHORITY_RECORD_CREATED, DI_ORDER_CREATED, DI_ORDER_CREATED_READY_FOR_POST_PROCESSING, DI_COMPLETED, DI_ERROR]