Hello,
I just browsed a bit through the climate stuff and added things to my scraping. This resulted in 140GB of data from 15 domains and you can find and browse through it here. Unfortunately, I ran out of disk space, so these are probably incomplete and I can not put them into Already Safeguarded :white_check_mark: . However, I do think this is already a nice start. If you think, something is irrelevant, let me know, so I can delete it and archive something else. And if you need an rsync link or something else to get the full data, let me also know.

2 Likes

Wow, this is incredible, thank you!

2 Likes

For easier overview, the domains I have stuff from are the following:

hprcc.unl.edu/
mrcc.purdue.edu/
ocean.weather.gov/
prism.oregonstate.edu/
sercc.com/
stateclimate.org/
urbanoceanlab.org/
wrcc.dri.edu/
www.climatehubs.usda.gov/
www.cpc.ncep.noaa.gov/
www.drought.gov/
www.nhc.noaa.gov/
www.nrcs.usda.gov/
www.weather.gov/
www.wpc.ncep.noaa.gov/

1 Like

how deep did you scrape?

I just used the standard wget command form the base domain everywhere. So it should have scraped everything that is linked somewhere and under the same domain right? Except that the downloads cancelled at some point because my disk was full.

1 Like

your last point is a good sign :)