|
If you prefer the command line, I have a scraper written in Python 3.6+ here. (Yes, you can see all my other crappy code and half-finished projects if you want to.) e: Also, yes, I ran into a rate limit with an early version of this, which is why mine is limited to 10 requests per second. Takes longer, but doesn't hammer the server as hard. SneezeOfTheDecade has a new favorite as of 03:40 on Jun 25, 2020 |
# ¿ Jun 25, 2020 03:33 |
|
|
# ¿ Apr 27, 2024 23:31 |
|
Trabant posted:^ Thanks -- it trucked along for a while but failed on page 36 with: Whoops, sorry about that - forgot to set encoding on the output file. Please re-download main.py - it should work now (I just tested it!).
|
# ¿ Jun 25, 2020 04:15 |
|
Crankit posted:can someone figure out how to archive the SAclopedia? If SaberCat isn't completely collapsing (it, er, might be ), you can find an archive here.
|
# ¿ Jun 25, 2020 11:37 |
|
Ynglaur posted:I figured it out. My password had a percent sign (%) in it, which the the parser of config.ini didn't like. I changed my password to not include that symbol and it's churning along fine now. Crud, sorry about that - I didn't realize configparser treated percent signs as special characters. I should figure out how to handle that. And thank you for reminding me that "requests" isn't a part of the core libraries!
|
# ¿ Jun 25, 2020 18:10 |
|
I should post the archives I've uploaded here, too, rather than just in their threads: SAclopedia Roman/Ancient History (just in case ) Games - GM Advice Thread Games - D&D 5e Megathread Games - Making Games Megathread Games - Video Game Hoaxes and Urban Legends Games - Murphy's Rules GWS - Help! I'm poor and want to make good food! BSS - Kill Six Billion Demons D&D/LF Political Cartoons threads: 2007-8, 2009, 2010, 2011, 2012 part 1, 2012 part 2, 2013 part 1, 2013 part 2, 2014 part 1, 2014 part 2, 2015, 2016, 2017, 2018, 2019, 2020 D&D Politoons Gaybies threads
|
# ¿ Jun 26, 2020 01:35 |
|
D. Ebdrup posted:Please understand, I don't want SneezeOfTheDecade of the decade to feel like their effort has been wasted - I really appreciate it, as it got me started archiving the stuff I care the most about, and it's made in such a way that it's easy to include in a command-line one-liner, which, honestly, is the biggest part. Absolutely no problem at all - I want people to use the best tool they have available, whether or not it's My Tool. That said, my tool (gonna link it again) now collects the CSS and scripting in the header, converts page links to be relative (so you can click 'em), and optionally downloads images with the --images flag. SneezeOfTheDecade has a new favorite as of 01:00 on Jun 27, 2020 |
# ¿ Jun 27, 2020 00:58 |
|
Trabant posted:Sneeze, I had luck with your previous tool (thanks!) and wanted to try this version but I get the following: Iiii forgot to tell you that you have to install PIL and BeautifulSoup >_< code:
SneezeOfTheDecade has a new favorite as of 14:54 on Jun 27, 2020 |
# ¿ Jun 27, 2020 01:29 |
|
|
# ¿ Apr 27, 2024 23:31 |
|
Crankit posted:The python thing is pretty good, but it seems to choke on images that are attached to the forum. Hm, it shouldn't. I'll poke at it and see what's going on.
|
# ¿ Jun 30, 2020 15:21 |