[FOLIO-3369] FOLIO Vagrant builds failing Created: 17/Dec/21  Updated: 12/Jan/22  Resolved: 12/Jan/22

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Bug Priority: TBD
Reporter: Wayne Schneider Assignee: Wayne Schneider
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Blocks
blocks MODDATAIMP-611 Verify the Authority Update workflow Closed
blocks MODNOTES-189 Verify mod-notes according to spring way Closed
Sprint: DevOps Sprint 130, DevOps Sprint 131, DevOps Sprint 129
Development Team: FOLIO DevOps

 Description   

The Vagrant box builds have been failing for the last couple of days with the error:

testing: fatal: [default]: FAILED! => {"changed": true, "cmd": "yarn build output --okapi http://localhost:9130 --tenant diku  --sourcemap", "delta": "0:09:41.779621", "end": "2021-12-17 06:14:38.860984", "msg": "non-zero return code", "rc": 134, "start": "2021-12-17 06:04:57.081363", "stderr": "\n<--- Last few GCs --->\n\n[46020:0x4f169d0]   565255 ms: Mark-sweep 3988.3 (4120.6) -> 3972.9 (4120.4) MB, 7693.4 / 0.0 ms  (average mu = 0.107, current mu = 0.010) allocation failure scavenge might not succeed\n[46020:0x4f169d0]   573304 ms: Mark-sweep 3989.6 (4121.1) -> 3974.1 (4121.6) MB, 7923.7 / 0.0 ms  (average mu = 0.062, current mu = 0.016) allocation failure scavenge might not succeed\n\n\n<--- JS stacktrace --->\n\nFATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory\n 1: 0xa389b0 node::Abort() [stripes-cli]\n 2: 0x96e0af node::FatalError(char const*, char const*) [stripes-cli]\n 3: 0xbb7a4e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [stripes-cli]\n 4: 0xbb7dc7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [stripes-cli]\n 5: 0xd73fd5  [stripes-cli]\n 6: 0xd74b5f  [stripes-cli]\n 7: 0xd8299b v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [stripes-cli]\n 8: 0xd8655c v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [stripes-cli]\n 9: 0xd54c3b v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [stripes-cli]\n10: 0x109d21f v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [stripes-cli]\n11: 0x1446379  [stripes-cli]\nAborted (core dumped)\nerror Command failed with exit code 134.", "stderr_lines": ["", "<--- Last few GCs --->", "", "[46020:0x4f169d0]   565255 ms: Mark-sweep 3988.3 (4120.6) -> 3972.9 (4120.4) MB, 7693.4 / 0.0 ms  (average mu = 0.107, current mu = 0.010) allocation failure scavenge might not succeed", "[46020:0x4f169d0]   573304 ms: Mark-sweep 3989.6 (4121.1) -> 3974.1 (4121.6) MB, 7923.7 / 0.0 ms  (average mu = 0.062, current mu = 0.016) allocation failure scavenge might not succeed", "", "", "<--- JS stacktrace --->", "", "FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory", " 1: 0xa389b0 node::Abort() [stripes-cli]", " 2: 0x96e0af node::FatalError(char const*, char const*) [stripes-cli]", " 3: 0xbb7a4e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [stripes-cli]", " 4: 0xbb7dc7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [stripes-cli]", " 5: 0xd73fd5  [stripes-cli]", " 6: 0xd74b5f  [stripes-cli]", " 7: 0xd8299b v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [stripes-cli]", " 8: 0xd8655c v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [stripes-cli]", " 9: 0xd54c3b v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [stripes-cli]", "10: 0x109d21f v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [stripes-cli]", "11: 0x1446379  [stripes-cli]", "Aborted (core dumped)", "error Command failed with exit code 134."], "stdout": "yarn run v1.22.17\n$ export NODE_OPTIONS=\"--max-old-space-size=4096 $NODE_OPTIONS\"; stripes build stripes.config.js output --okapi http://localhost:9130 --tenant diku --sourcemap\nBuilding...\ninfo Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.", "stdout_lines": ["yarn run v1.22.17", "$ export NODE_OPTIONS=\"--max-old-space-size=4096 $NODE_OPTIONS\"; stripes build stripes.config.js output --okapi http://localhost:9130 --tenant diku --sourcemap", "Building...", "info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command."]}

...which suggests a memory issue (i.e. --max-old-space-size should be higher).

But, this only appears to affect the packer builds. It does not happen when building EC2 system, nor when building a Vagrant box directly using Vagrant.



 Comments   
Comment by Wayne Schneider [ 06/Jan/22 ]

I was able to reproduce running packer on the command line. Observing system resource usage during the webpack build, I did see stripes-cli using a large (6G +) amount of memory and the whole system running pretty tight on memory before the build crashed. Upgrading the base ubuntu-20.04 box to a more recent version (the version we are using dates back to December 2020!) resolved the issue in testing. It is hard to guess why this would be; possibly some OS or node memory management improvements.

It does make some sense that we would run into this issue with the packer/VirtualBox build and not with the EC2 build, as the EC2 instances have 32G allocated, while we can only allocate 20G to the VirtualBox build.

Comment by Wayne Schneider [ 06/Jan/22 ]

Well, we now are able to build the webpack, but we run into this error at the end of the build:

[0;32m    testing-backend: TASK [vagrant-tidy : Run vagrant-tidy.sh] **************************************
    testing-backend: fatal: [default]: FAILED! => {"changed": true, "cmd": ["/root/vagrant-tidy.sh"], "delta": "0:01:53.444052", "end": "2022-01-06 21:51:58.804807", "msg": "non-zero return code", "rc": 4, "start": "2022-01-06 21:50:05.360755", "stderr": "dd: error writing '/EMPTY': No space left on device\n15836+0 records in\n15835+0 records out\n16604360704 bytes (17 GB, 15 GiB) copied, 14.3801 s, 1.2 GB/s\n/root/vagrant-tidy.sh: line 29: /var/log/postgresql/postgresql-12-main.log: Permission denied\n14681491+0 records in\n14681491+0 records out\n15033846784 bytes (15 GB, 14 GiB) copied, 25.9923 s, 578 MB/s\n821895+0 records in\n821895+0 records out\n841620480 bytes (842 MB, 803 MiB) copied, 1.82028 s, 462 MB/s\ndd: writing to '/swap.img': No space left on device\n36651337+0 records in\n36651336+0 records out\n18765484032 bytes (19 GB, 17 GiB) copied, 66.08 s, 284 MB/s\nsed: couldn't flush /etc/sedUg3AQw: No space left on device", "stderr_lines": ["dd: error writing '/EMPTY': No space left on device", "15836+0 records in", "15835+0 records out", "16604360704 bytes (17 GB, 15 GiB) copied, 14.3801 s, 1.2 GB/s", "/root/vagrant-tidy.sh: line 29: /var/log/postgresql/postgresql-12-main.log: Permission denied", "14681491+0 records in", "14681491+0 records out", "15033846784 bytes (15 GB, 14 GiB) copied, 25.9923 s, 578 MB/s", "821895+0 records in", "821895+0 records out", "841620480 bytes (842 MB, 803 MiB) copied, 1.82028 s, 462 MB/s", "dd: writing to '/swap.img': No space left on device", "36651337+0 records in", "36651336+0 records out", "18765484032 bytes (19 GB, 17 GiB) copied, 66.08 s, 284 MB/s", "sed: couldn't flush /etc/sedUg3AQw: No space left on device"], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nSetting up swapspace version 1, size = 17.5 GiB (18765479936 bytes)\nno label, UUID=4a8183d4-898d-4f08-8216-1abf90ffbba9", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "Setting up swapspace version 1, size = 17.5 GiB (18765479936 bytes)", "no label, UUID=4a8183d4-898d-4f08-8216-1abf90ffbba9"]}

The vagrant-tidy role is quite old and does not actually stop Okapi any more. I guess it should be reviewed and fixed up a bit.

Comment by Wayne Schneider [ 06/Jan/22 ]

Interesting, actually the vagrant-tidy role is fine, since it kills the Docker daemon, so all containers are stopped before the disk is cleaned up.

The issue may be that the latest bento/ubuntu-20.04 image cuts the size of the root partition in half.

Comment by Wayne Schneider [ 07/Jan/22 ]

After resizing the partition, I got the same error. Something is not working correctly with the vagrant-tidy.sh script.

Comment by Pavlo Smahin [ 11/Jan/22 ]

Wayne Schneider, Could you tell please when we could expect to have new Vagrant Boxes?

Comment by Siarhei Hrabko [ 11/Jan/22 ]

Existing vagrant boxes contain obsolete modules versions and financial sample data. This fact prevents verification of acquisition modules or significantly slows down this process. Could you please make this story higher prioritized?

Comment by Kateryna Senchenko [ 11/Jan/22 ]

Joining other devs with the question when we can expect updated boxes, it currently slows down Folijet's work as well.

CC: Ann-Marie Breaux

Comment by Wayne Schneider [ 11/Jan/22 ]

The vagrant-tidy role and disk zeroing script have been refactored. Unfortunately, because the latest bento/ubuntu-20.04 image uses a swap file instead of a swap partition, we can't get the image size down quite as small as before – folio-snapshot is now about 6.5 GB instead of 6.2 GB. But at least we are getting successful builds.

Comment by Wayne Schneider [ 12/Jan/22 ]

Successful builds of all three Vagrant boxes. Jenkins job re-enabled.

Generated at Thu Feb 08 23:27:35 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.