https://data.openei.org/s3_viewer?bucket=nrel-pds-nsrdb&prefix=v3%2F

I’m still getting my NAS set up but this is apparently urgent, so we can split this up

comment below to sign up for a range and i’ll check da boxes

  • 1.6 TiB v3/nsrdb_1998.h5
  • 1.6 TiB v3/nsrdb_1999.h5
  • 1.6 TiB v3/nsrdb_2000.h5
  • 1.6 TiB v3/nsrdb_2001.h5
  • 1.6 TiB v3/nsrdb_2002.h5
  • 1.6 TiB v3/nsrdb_2003.h5
  • 1.6 TiB v3/nsrdb_2004.h5
  • 1.6 TiB v3/nsrdb_2005.h5
  • 1.6 TiB v3/nsrdb_2006.h5
  • 1.6 TiB v3/nsrdb_2007.h5
  • 1.6 TiB v3/nsrdb_2008.h5
  • 1.6 TiB v3/nsrdb_2009.h5
  • 1.6 TiB v3/nsrdb_2010.h5
  • 1.6 TiB v3/nsrdb_2011.h5
  • 1.6 TiB v3/nsrdb_2012.h5
  • 1.6 TiB v3/nsrdb_2013.h5
  • 1.6 TiB v3/nsrdb_2014.h5
  • 1.6 TiB v3/nsrdb_2015.h5
  • 1.6 TiB v3/nsrdb_2016.h5
  • 1.6 TiB v3/nsrdb_2017.h5
  • 1.5 TiB v3/nsrdb_2018.h5
  • 1.5 TiB v3/nsrdb_2019.h5
  • 1.5 TiB v3/nsrdb_2020.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_1998.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_1999.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2000.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2001.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2002.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2003.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2004.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2005.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2006.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2007.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2008.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2009.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2010.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2011.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2012.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2013.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2014.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2015.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2016.h5
  • 5.7 GiB v3/puerto_rico/nsrdb_puerto_rico_2017.h5
  • 831.6 GiB v3/tdy/nsrdb_tdy-2016.h5
  • 831.6 GiB v3/tdy/nsrdb_tdy-2017.h5
  • 796.3 GiB v3/tdy/nsrdb_tdy-2018.h5
  • 796.3 GiB v3/tdy/nsrdb_tdy-2019.h5
  • 796.3 GiB v3/tdy/nsrdb_tdy-2020.h5
  • 831.6 GiB v3/tgy/nsrdb_tgy-2016.h5
  • 831.6 GiB v3/tgy/nsrdb_tgy-2017.h5
  • 796.3 GiB v3/tgy/nsrdb_tgy-2018.h5
  • 796.3 GiB v3/tgy/nsrdb_tgy-2019.h5
  • 796.3 GiB v3/tgy/nsrdb_tgy-2020.h5
  • 831.6 GiB v3/tmy/nsrdb_tmy-2016.h5
  • 831.6 GiB v3/tmy/nsrdb_tmy-2017.h5
  • 796.3 GiB v3/tmy/nsrdb_tmy-2018.h5
  • 796.3 GiB v3/tmy/nsrdb_tmy-2019.h5
  • 796.3 GiB v3/tmy/nsrdb_tmy-2020.h5

distribution

1 Like

Grabbing this one but DL going kinda slow.

1 Like

I am on v3/nsrdb_2020.h5

Fair speed for it.
Have the space as well.

Will update as I go

3 Likes

“I have 1998-2002 downloading now of the 5.7gb.”

“Now 03 and 04”

this is of the puerto rico datasets yes? the 5.7gb files?

yeah - will update the list (I can click the boxes on my own)

1 Like

ok since the s3 downloader downloads all the files in parallel rather than sequentially, i’m going to go one by one so we at least have some useful data if the link gets shut off, going backwards in tdy from 2020. unchecking the rest and will check off as i go

2 Likes

I can do all of

  • 831.6 GiB v3/tgy/nsrdb_tgy-2016.h5
  • 831.6 GiB v3/tgy/nsrdb_tgy-2017.h5
  • 796.3 GiB v3/tgy/nsrdb_tgy-2018.h5
  • 796.3 GiB v3/tgy/nsrdb_tgy-2019.h5
  • 796.3 GiB v3/tgy/nsrdb_tgy-2020.h5

if that works

I’ll start at the most recent (2020) and go backwards so if others want to take some just ping and I’ll report my progress

EDIT: estimate is probably around 12 hours for the tgy set

2 Likes

Friend of mine is doing v3/nsrdb_2019.h5 right now.

1 Like

highest priority is apparently the tmy directory - so while someone is already on this, if anyone else has spare bandwidth and space, duplicates of that would be welcome

1 Like

Unless there is something more important, I will grab a copy of tmy and:

  1.6 TiB v3/nsrdb_2012.h5
  1.6 TiB v3/nsrdb_2013.h5
  1.6 TiB v3/nsrdb_2014.h5
  1.6 TiB v3/nsrdb_2015.h5
  1.6 TiB v3/nsrdb_2016.h5
  1.6 TiB v3/nsrdb_2017.h5
  1.5 TiB v3/nsrdb_2018.h5
  1.5 TiB v3/nsrdb_2019.h5
  1.5 TiB v3/nsrdb_2020.h5

(assuming more recent data is more important)

2 Likes

Given the importance of tmy, I’ll grab v3/tmy/nsrdb_tmy-2018.h5 after nsrdb_tgy-2020.h5 finishes

Since it seems like people are generally going most recent to backwards.

EDIT since I’m limited to 3 replies lmao

Friend has switched from the above named one to v3/tmy/nsrdb_tmy-2017.h5 since that’s the critical set

EDIT AGAIN

I’ve finished tgy/nsrdb_tgy-2020.h5 and have moved on to tmy/nsrdb_tmy-2018.h5

2 Likes

I have people doing nsrdb_2018.h5, nsrdb_2017.h5, and tdy/nsrdb_tdy-2018.h5.

I also have someone doing nsrdb_2013.h5 but it’s slow

as we get close to the theoretical “end of day may mean midnight eastern time,” if everyone could take a second to refresh what they have and what they are grabbing that would be good:

for me:

Current status

file completion percent rate eta
tdy/nsrdb_tdy-2018.h5 431/796GB 54% 40MB/s 2h
tdy/nsrdb_tdy-2019.h5 494/796GB 62% 100MB/s 1h
tdy/nsrdb_tdy-2020.h5 645/796GB 81% 45MB/s 1h

Enqueued

I spend the day getting my NAS set up and had a bit of data loss from a canceled transfer, so i am currently only copying these single files, but after this round I will queue these up overnight, in case they stay up, going backwards from where @zyyygz is leaving off

  • 1.6 TiB v3/nsrdb_2008.h5
  • 1.6 TiB v3/nsrdb_2009.h5
  • 1.6 TiB v3/nsrdb_2010.h5
  • 1.6 TiB v3/nsrdb_2011.h5

and that’s probably all i’ll be able to grab overnight so i won’t claim more.

Question

Has anyone been able to exceed ~100MB/s on a single machine? I haven’t been able to with aws cli, rclone, or https downloading across single/multiple files and with a bunch of different threading and cache params. I have confirmed my download bandwidth is >500MB/s and my write speed is 1GB/s and can’t find the bottleneck unless it’s being imposed by AWS

1 Like

from some testing a friend of mine did, it appears that Amazon rate-limits each downloading machine, across all download methods in use at the time

I do have to advise not trying to subvert this, it’s the kind of thing they could ban people for

Just finished tmy-2018; as I do not believe anyone has stated they have tmy-2019, I will download that before moving back to tgy.

  • tmy – 1.5TiB/4.0TiB – ETA 14h
  • nsrdb_2020.h5 1.3TiB/1.5TiB – ETA 1h
  • nsrdb_2018.h5 1.3TiB/1.5TiB – ETA 1h

skipping nsrdb_2019.h5 since it’s already checked off (?) and continuing with nsrdb_2017.h5 once 2018 is done.

does “checked” state mean “done”?

Has anyone been able to exceed ~100MB/s on a single machine?

above is on a single machine and appears to be 150MiB/s – three aws processes running in parallel (sync and two cp). no tuning. peak was 300MiB/s and it got throttled to this number. running a single aws process only gives me 90 MiB/s max.

I’ve started on

  • 1.6 TiB v3/nsrdb_1998.h5
  • 1.6 TiB v3/nsrdb_1999.h5
  • 1.6 TiB v3/nsrdb_2000.h5

Will post back once they’re done.

1 Like