Recommended Maximum File Sizes and Configuration

Recommended Maximum File Sizes and Configuration

Please work with your systems office or hosting to provider to ensure these configurations are in place. After more performance, reliability, and stability work, these recommendations will be re-evaluated.

For additional information, please see the following:

Maximum File Size (MG) no background activity, no concurrent import jobs, single tenant

  • CREATE (SRS MARC, Instances, Holdings, Items): 100,000 MARC Bib records (details)

  • UPDATE (SRS MARC, Instances, Holdings, Items): 50,000 MARC Bib records

  • For results with background activities (Check-in/Check-out), concurrent jobs and multi-tenant set up please refer to results from PTF

Import Statistics, with recommended values no background activity, no concurrent import jobs, single tenant



Orchid

Morning Glory

Lotus

Juniper and Kiwi



2 instances

2 instances

1 instance

2 instances

2 instances

MARC Records

CREATE Duration

UPDATE Duration

CREATE Duration

UPDATE Duration

CREATE Duration

UPDATE Duration

CREATE Duration

UPDATE Duration

CREATE Duration

UPDATE Duration

5000

5 min



7min

11min

13min

16min

8min

13min

15 minutes

20-30 minutes 

10,000

14 min



16min

22min

31min

40min

19min

25min





30,000

46 min







1h 31min

1h 4 min

45min

1h 36min





50,000

54 min



59min

1h42min

1h 34min

2h 17min

1h 21min

1h 51min

2h 30min+

11+ hours

100,000

3h37min  (ongoing investigation)



2h20min

2h49min

3h 21min



3h 10min

4h 10min





Maximum File Sizes (Juniper and Kiwi)

  • CREATE Import (SRS MARC, Instances, Holdings, Items): 50,000 MARC records max

  • UPDATE Import: 5,000 MARC records max

Import Statistics, with background activity

  • With 5-users check-in/out background activities

  • And concurrent imports by different tenants

    • Check-in/out time increases by 50%-100% depending on number of concurrent users.

    • Data import takes 2x longer to complete.

MARC Records

CREATE Duration

UPDATE Duration

MARC Records

CREATE Duration

UPDATE Duration

1,000

10 minutes

10 minutes

5,000

30 minutes

40-60 minutes (depending on complexity of profile)

25,000

2-3 hours

8+ hours

50,000

5+ hours

22+ hours

Import Statistics,  no background activity, no concurrent import jobs

MARC Records

CREATE Duration

UPDATE Duration

MARC Records

CREATE Duration

UPDATE Duration

1,000

5 minutes

5 minutes

5,000

15 minutes

20-30 minutes (depending on complexity of profile)

25,000

60-80 minutes

4+ hours

50,000

150+ minutes

11+ hours

Note: For Lotus 500,000 Create Duration -15h 37min (details)

Key Settings and Configurations



mod-data-import

Properties related to file upload that should be set at mod-configuration are described in the doc  https://github.com/folio-org/mod-data-import#module-properties-to-set-up-at-mod-configuration

System property that can be adjusted

Default value

System property that can be adjusted

Default value

file.processing.marc.raw.buffer.chunk.size

50

file.processing.marc.json.buffer.chunk.size

50

file.processing.marc.xml.buffer.chunk.size

10

file.processing.edifact.buffer.chunk.size

10

For releases prior to Kiwi it is recommended to set the file.processing.buffer.chunk.size property to in order to prevent mod-source-record-storage from crashing with OOM during an Update import of 5,000 records. The property can be set to to allow Update import of 10,000 records.

mod-source-record-manager 

System property that can be adjusted

Default value

Comment



System property that can be adjusted

Default value

Comment



srm.kafka.RawMarcChunkConsumer.instancesNumber

1





srm.kafka.StoredMarcChunkConsumer.instancesNumber

1





srm.kafka.DataImportConsumersVerticle.instancesNumber

1





srm.kafka.DataImportJournalConsumersVerticle.instancesNumber

1





srm.kafka.RawChunksKafkaHandler.maxDistributionNum

100





srm.kafka.CreatedRecordsKafkaHandler.maxDistributionNum

100





srm.kafka.DataImportConsumer.loadLimit

5





security.protocol

PLAINTEXT





ssl.protocol

TLSv1.2





ssl.key.password

-





ssl.keystore.location

-





ssl.keystore.password

-





ssl.keystore.type

JKS





ssl.truststore.location

-





ssl.truststore.password

-





ssl.truststore.type

JKS





di.flow.control.max.simultaneous.records

50

Defines how many records can be processed by the system simultaneously

Morning Glory

di.flow.control.records.threshold

25

Defines how many records from previous batch must be processed before throwing new records to pipeline

Morning Glory

di.flow.control.enable

true

Allows for single record imports to be processed while larger file imports are being processed

Morning Glory

di.flow.control.reset.state.interval

PT5M

Time between resetting state of flow control. By default it triggered each 5 mins.

Morning Glory

kafka.producer.batch.size

16*1024

Producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server. This configuration controls the default batch size in bytes

Nolana 

kafka.producer.linger.ms

0

Producer groups together any records that arrive in between request transmissions into a single batched request. Normally this occurs only under load when records arrive faster than they can be sent out. However in some circumstances the client may want to reduce the number of requests even under moderate load. This setting accomplishes this by adding a small amount of artificial delay that is, rather than immediately sending out a record the producer will wait for up to the given delay to allow other records to be sent so that the sends can be batched together

Nolana

kafka.producer.request.timeout.ms

30000

The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted

Nolana

kafka.producer.delivery.timeout.ms

120000

An upper bound on the time to report success or failure after a call to send() returns. This limits the total time that a record will be delayed prior to sending, the time to await acknowledgement from the broker (if expected), and the time allowed for retriable send failures.
kafka.producer.delivery.timeout.ms must be equal to or larger than kafka.producer.linger.ms + kafka.producer.request.timeout.ms

Nolana

kafka.producer.retry.backoff.ms

100

Time to wait before attempting to retry a failed request to a given topic partition

Nolana

                                                                 

mod-source-record-storage

System property that can be adjusted

Default value

System property that can be adjusted

Default value

srs.kafka.ParsedMarcChunkConsumer.instancesNumber

1

srs.kafka.DataImportConsumer.instancesNumber

1

srs.kafka.ParsedRecordChunksKafkaHandler.maxDistributionNum

100

srs.kafka.DataImportConsumer.loadLimit

5

srs.kafka.DataImportConsumerVerticle.maxDistributionNum

100

srs.kafka.ParsedMarcChunkConsumer.loadLimit

5

security.protocol

PLAINTEXT

ssl.protocol

TLSv1.2

ssl.key.password

-

ssl.keystore.location

-

ssl.keystore.password

-

ssl.keystore.type

-

ssl.truststore.location

JKS

ssl.truststore.password

-

ssl.truststore.type

JKS

   

mod-inventory

System property that can be adjusted

Default value

System property that can be adjusted

Default value

inventory.kafka.DataImportConsumerVerticle.instancesNumber

3

inventory.kafka.MarcBibInstanceHridSetConsumerVerticle.instancesNumber

3

inventory.kafka.DataImportConsumer.loadLimit

5

inventory.kafka.DataImportConsumerVerticle.maxDistributionNumber

100

inventory.kafka.MarcBibInstanceHridSetConsumer.loadLimit

5