Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Remove system user

Table of Contents
maxLevel3

...

  1. Configure a S3-compatible storage (such as AWS S3 or MinIO) and supply its connection details in the environment variables
    1. Ensure CORS is configured as applicable
  2. Enable the feature using the SPLIT_FILES_ENABLED and RECORDS_PER_SPLIT_FILE environment variablesSpecify a password for the new system user via environment variables
  3. (Optional) Setup garbage collection for abandoned uploads in AWS S3/MinIO
  4. (Optional) Customize the chunk prioritization algorithm via environment variables; recommended ones are listed here

Don't want this feature?

By default, this feature is disabled.  If this feature is disabled, no other configuration is necessary, and no S3 storage needs to be created/setup. However, for technical reasons, the system user will still be created (although unused).

Additionally, please note that without this feature, downloading files from the UI will be unavailable.

...

PROCESSINGUSERNAME_REFERENCE
Purpose

Parameter

Required

Type (unit)

Default value

Notes

Main featureSPLIT_FILES_ENABLEDyes, if enabling the futuretrue, falsefalseThis feature is currently opt-in
Main featureRECORDS_PER_SPLIT_FILEnoint (records)1000Lower values result in easier to debug tiny pieces, whereas larger values result in less job log clutter
Main featureASYNC_PROCESSOR_POLL_INTERVAL_MSnoint (msec)5000

The number of milliseconds between times when the module checks the queue for waiting jobs.
This is also the "worst case" amount of time between when a job is queued and actually started. The average time is half this value.
A lower number means slightly increased response times, whereas a higher number decreases database load.
Note: once a job is begun, there will be no delays between further checks against the database until all jobs are completed.

Main featureASYNC_PROCESSOR_MAX_WORKERS_COUNTnoint ≥ 11The maximum number of jobs/slices to simultaneously process in a single instance.  

The worker count is useful for production/multi-tenant environments, where you might want to provide more capacity without additional instances.

Please note that multiple workers running simultaneously may cause some odd behavior when only one user is running a job, as multiple parts may appear to complete together.

S3 StorageAWS_URLyes, if splitting enabledURL as string

http://127.0.0.1:9000/

Location of the S3-compatible storage
S3 StorageAWS_REGIONyes, if splitting enabledstringnoneS3 region
S3 StorageAWS_BUCKETyes, if splitting enabledstringnoneBucket name
S3 StorageAWS_ACCESS_KEY_IDyes, if splitting enabledstringnoneAccess key 
S3 StorageAWS_SECRET_ACCESS_KEYyes, if splitting enabledstringnoneAccess secret
S3 StorageAWS_SDKyes, if using AWS and
splitting is enabled
true, falsefalseIf the S3 storage is AWS (true) or MinIO (false)
S3 StorageS3_FORCEPATHSTYLEnotrue, falsefalseIf buckets should be referenced by path instead of virtual host
System userChunk prioritizationSYSTEM

SCORE_

JOB_

SMALLEST

nostringdata-import-system-userUsername for the system user that processes jobs asynchronously
System userSYSTEM_PROCESSING_PASSWORDyes, if splitting enabledstringnonePassword for the system user that processes jobs asynchronously. Please note, the module must be reinstalled for changes to take effect.
Chunk prioritization

SCORE_JOB_SMALLEST

noint

40

See the section below on customizing the scoring algorithm.

Chunk prioritization

SCORE_JOB_LARGEST

noint

-40


Chunk prioritization

SCORE_JOB_REFERENCE

noint (records)

100000


Chunk prioritization

SCORE_AGE_NEWEST

noint

0


Chunk prioritization

SCORE_AGE_OLDEST

noint

50


Chunk prioritization

SCORE_AGE_EXTREME_THRESHOLD_MINUTES

noint (minutes)

480


Chunk prioritization

SCORE_AGE_EXTREME_VALUE

noint

10000


Chunk prioritization

SCORE_TENANT_USAGE_MIN

noint

100


Chunk prioritization

SCORE_TENANT_USAGE_MAX

noint

-200


Chunk prioritization

SCORE_PART_NUMBER_FIRST

noint

1

Chunk prioritization

SCORE_PART_NUMBER_LAST

noint

0


Chunk prioritization

SCORE_PART_NUMBER_LAST

noint100

0

System user

This feature creates a new system user to enable asynchronous processing of split file chunks.  This user is named SystemDataImport and has the username data-import-system-user (credentials configurable via environment variables).  The password environment variable must be set for the user to be created; no default is provided.

...


Chunk prioritizationSCORE_PART_NUMBER_LAST_REFERENCEnoint100

AWS S3 CORS Configuration

...

For normal files this time frame can be tailored to what tenants and host desire and done easily in the AWS S3 UI.. 

S3 storage lense can lens can help find them as a Manual option

...

This can be done in a command like: 

s3api put-bucket-lifecycle-configuration --bucket <bucketName> --lifecycle-configuration '<json content>'

MinIO procedure:

Info can be found here:

...