[FOLIO-3707] Archive and decomission "Discuss.folio.org" Created: 14/Feb/23  Updated: 02/Mar/23  Resolved: 02/Mar/23

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: TBD
Reporter: Peter Murray Assignee: Peter Murray
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Sprint: DevOps Sprint 159, DevOps Sprint 160
Development Team: FOLIO DevOps
RCA Group: TBD

 Comments   
Comment by Peter Murray [ 14/Feb/23 ]

Per recommendation at: Archiving a Discourse Forum – H-RD.ORG

time wget --mirror \
--page-requisites \
--span-hosts \
--domains=discuss.folio.org,openlibraryfoundation-discourse-files.s3.amazonaws.com \
--convert-links \
--adjust-extension \
--compression=auto \
--reject-regex "/search" \
--reject "*.rss" \
--no-if-modified-since \
--no-check-certificate \
--execute robots=off \
--random-wait \
--wait=1 \
--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36" \
--no-cookies \
--tries=3 \
https://discuss.folio.org/
Comment by Peter Murray [ 14/Feb/23 ]

Mirror run took 2:47:18.

Comment by Peter Murray [ 15/Feb/23 ]

Second run with new parameters:

Converted links in 1471 files in 2.0 seconds.
wget -mirror --page-requisites --span-hosts  -convert-links   --reject-rege  6.54s user 11.36s system 0% cpu 2:39:36.45 total

Comment by Peter Murray [ 15/Feb/23 ]

Latest run stats after changing the user-agent string:

 

Converted links in 1427 files in 7.0 seconds.
wget -mirror --page-requisites --span-hosts  -convert-links   --reject-rege  9.14s user 15.13s system 0% cpu 2:52:13.18 total

Comment by Peter Murray [ 01/Mar/23 ]

Bailed on wget because of the realtime dynamic callbacks to Discourse in the front-end  Javascript.  Used https://github.com/m00k12/ArchiveDiscourse instead, plus a custom CloudFront function to handle necessary redirects. Will probably write a blog post about this some day.

Comment by Peter Murray [ 02/Mar/23 ]

Created an AMI of the discuss server, terminated the server, and released the elastic IP.

Comment by Peter Murray [ 02/Mar/23 ]

Created AMI of the shutdown server and terminated it.

Generated at Thu Feb 08 23:30:06 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.