Skip to end of banner
Go to start of banner

POC: Make data-export horizontally scalable

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »


Content:


PURPOSE:

MDEXP-90 - Getting issue details... STATUS

Explore approaches for horizontal scaling of the mod-data-export module.


There are three options to make module horizontally scalable:

  • Store file with UUIDs in the AWS S3 bucket;
  • Store file with UUIDs in the DB;
  • Solve this issue at the infrastructure layer.

OPTIONS:

Store file with UUIDs in the AWS S3 bucket.

 Advantages:          

  1. This approach required not a lot of changes in code:

             - save the file with UUIDs in S3 bucket instead of the file system;

             - add the logic of removing files from the S3 bucket to the existing cleaning mechanism;

     There are 2 ways of cleaning files from S3:

         1. Create a lifecycle configuration for the folder in the bucket at the AWS side. The disadvantages of this method: we will not have any information that files successfully removed, and when I tried to create these configs in my S3 bucket, looks like it may cost money.

            Apart from that, we can set expiration date only to days, so the file can be deleted only after the file will be in S3 at least a day.

         2. Add a call to remove a file from S3 in our cleaning mechanism. We can log the response to see if the file removed successfully. We can use the DELETE method, that described in this documentation.

 Disadvantages:

      1. When I saved a file with 110 thousands of UUIDs to S3 by calling /data-export/file-definitions/{UUID}/upload endpoint, it took 7min and 30 sec on my local machine to save it.

Store file with UUIDs in the DB.

 Advantages:

  1. This approach is much faster compared to the approach with S3 bucket ;

 Disadvantages:

  1. This approach required a lot of changes, for example: creating a table in DB, provide an endpoint to upload the file with UUIDs;
  2. Need to upgrade the periodic cleaning job mechanism to remove old objects from DB;
  3. If the cleaning job, for some reason, will be stuck, the table will full of useless data. Apart from that, we will save temporary and possibly large files to the DB, which can negatively affect the overall performance of the database.

Solve the issue at the infrastructure layer.

It means, that the hoster will make configuration, that will make some folders sharable between modules. For example, if the hoster will deploy our folio application in the Kubernetes environment, then sharable folders can be configured by Persistance Volumes.

There is a discussion about this approach, we can use it only if the folio community will approve it. You can find the description of this approach in this draft document.

Advantages:

       1. We do not need to do any code changes, and the performance is better than in the first 2 options.

Disadvantages:

  1. The hoster should make the folder sharable between modules.

CONCLUSION:

If the folio community will agree with the third option, we will use it. 


  • No labels