Table of Contents | ||
---|---|---|
|
...
- Configure a S3-compatible storage (such as AWS S3 or MinIO) and supply its connection details in the environment variables
- Enable the feature using the
SPLIT_FILES_ENABLED
andRECORDS_PER_SPLIT_FILE
environment variablesSpecify a password for the new system user via environment variables - (Optional) Setup garbage collection for abandoned uploads in AWS S3/MinIO
- (Optional) Customize the chunk prioritization algorithm via environment variables; recommended ones are listed here
Don't want this feature?
By default, this feature is disabled. If this feature is disabled, no other configuration is necessary, and no S3 storage needs to be created/setup. However, for technical reasons, the system user will still be created (although unused).
Additionally, please note that without this feature, downloading files from the UI will be unavailable.
...
Purpose | Parameter | Required | Type (unit) | Default value | Notes | |||||
---|---|---|---|---|---|---|---|---|---|---|
Main feature | SPLIT_FILES_ENABLED | yes, if enabling the future | true, false | false | This feature is currently opt-in | |||||
Main feature | RECORDS_PER_SPLIT_FILE | no | int (records) | 1000 | Lower values result in easier to debug tiny pieces, whereas larger values result in less job log clutter | |||||
Main feature |
| no | int (msec) | 5000 | The number of milliseconds between times when the module checks the queue for waiting jobs. | |||||
Main feature |
| no | int ≥ 1 |
| The maximum number of jobs/slices to simultaneously process in a single instance. The worker count is useful for production/multi-tenant environments, where you might want to provide more capacity without additional instances. Please note that multiple workers running simultaneously may cause some odd behavior when only one user is running a job, as multiple parts may appear to complete together. | |||||
S3 Storage | AWS_URL | yes, if splitting enabled | URL as string | Location of the S3-compatible storage | ||||||
S3 Storage | AWS_REGION | yes, if splitting enabled | string | none | S3 region | |||||
S3 Storage | AWS_BUCKET | yes, if splitting enabled | string | none | Bucket name | |||||
S3 Storage | AWS_ACCESS_KEY_ID | yes, if splitting enabled | string | none | Access key | |||||
S3 Storage | AWS_SECRET_ACCESS_KEY | yes, if splitting enabled | string | none | Access secret | |||||
S3 Storage | AWS_SDK | yes, if using AWS and splitting is enabled | true, false | false | If the S3 storage is AWS (true) or MinIO (false) | |||||
S3 Storage | S3_FORCEPATHSTYLE | no | true, false | false | If buckets should be referenced by path instead of virtual host | |||||
System userChunk prioritizationSYSTEM |
| PROCESSING
| USERNAME
| no | string | data-import-system-user | Username for the system user that processes jobs asynchronously | |||
System user | SYSTEM_PROCESSING_PASSWORD | yes, if splitting enabled | string | none | Password for the system user that processes jobs asynchronously. Please note, the module must be reinstalled for changes to take effect. | |||||
Chunk prioritization |
| no | int |
| See the section below on customizing the scoring algorithm. | |||||
Chunk prioritization |
| no | int |
| ||||||
Chunk prioritization |
| no | int (records) |
| ||||||
Chunk prioritization |
| no | int |
| ||||||
Chunk prioritization |
| no | int |
| ||||||
Chunk prioritization |
| no | int (minutes) |
| ||||||
Chunk prioritization |
| no | int |
| ||||||
Chunk prioritization |
| no | int |
| ||||||
Chunk prioritization |
| no | int |
| ||||||
Chunk prioritization |
| no | int |
| Chunk prioritization |
| no | int |
| |
Chunk prioritization |
| _REFERENCEno | int100 |
|
System user
This feature creates a new system user to enable asynchronous processing of split file chunks. This user is named SystemDataImport
and has the username data-import-system-user
(credentials configurable via environment variables). The password environment variable must be set for the user to be created; no default is provided.
...
Chunk prioritization | SCORE_PART_NUMBER_LAST_REFERENCE | no | int | 100 |
AWS S3 CORS Configuration
...
For normal files this time frame can be tailored to what tenants and host desire and done easily in the AWS S3 UI..
S3 storage lense can lens can help find them as a Manual option
...
This can be done in a command like:
s3api put-bucket-lifecycle-configuration --bucket <bucketName> --lifecycle-configuration '<json content>'
MinIO procedure:
Info can be found here:
...