fletcher, who lives up to his status as poster in the digital packrats thread, has a script that can be used for archiving. It generates html files which support html pagination properly, and on top of it it also grabs the images and includes them, so you don't have to rely on webhosts staying up. Only "problem" with it is that it sometimes generates zero-byte files, but you can work around that by simply deleting the zero-byte files and running the script again. In case it's not obvious, it needs the python modules called requests, beautifulsoup4, and html5lib, and is meant to be installed with pip.
|
|
# ¿ Jun 25, 2020 22:54 |
|
|
# ¿ Apr 28, 2024 06:40 |
SubNat posted:How does this stack up vs SneezeOfTheDecade's script? If one of them does a good job + preserves pages then it might be better for me to just use that, if I can wrangle up python for it. fletchers script is made to grab every single image it can (the ones that don't 404, time out, or otherwise don't already work at any rate), and include that in the image. Another difference is that with fletchers script you just use the dropdown menu or pages to navigate, whereas with the other one you have to open each page in succession - I realize this is a minor issue, but it's still there. Also, fletchers script preserves the look of the forums, by including stylesheets. fletchers script does take up a lot of diskspace; a ~450 page thread took almost 900MB, and I can't imagine how much the cat pictures thread would take up, at over 6000 pages. Please understand, I don't want SneezeOfTheDecade of the decade to feel like their effort has been wasted - I really appreciate it, as it got me started archiving the stuff I care the most about, and it's made in such a way that it's easy to include in a command-line one-liner, which, honestly, is the biggest part. BlankSystemDaemon has a new favorite as of 06:37 on Jun 26, 2020 |
|
# ¿ Jun 26, 2020 06:34 |
fletcher fixed the error handling and made it so that images that fail to download no longer cause the page to not be downloaded, so if you've installed it you should do pip install -upgrade <url> to grab the latest update.
|
|
# ¿ Jun 26, 2020 18:16 |
SneezeOfTheDecade posted:Absolutely no problem at all - I want people to use the best tool they have available, whether or not it's My Tool.
|
|
# ¿ Jun 27, 2020 08:27 |
As a card-carrying member of the digital packrats thread, I can confirm that these archiving tools ABSOLUTELY have a use.
|
|
# ¿ Jun 27, 2020 17:35 |