Just throwing some notes here, trying to make a big list to avoid duplication and understand what’s covered
Archivists Work to Identify and Save the Thousands of Datasets Disappearing From Data.gov
Preserving Public U.S. Federal Data | Library Innovation Lab
Harvard Law School Library Innovation Lab
To begin, we have collected major portions of the datasets tracked by data.gov, federal Github repositories, and PubMed.
As a first step, we have collected the metadata and primary contents for over 300,000 datasets available on data.gov. As often happens with distributed collections of data, we have observed that linkrot is a pervasive problem. Many of the datasets listed in November 2024 contained URLs that do not work. Many more have come and gone since; there were 301,000 datasets on November 19, 307,000 datasets on January 19, and 305,000 datasets today. This can naturally arise as websites and data stores are reorganized.
To notify us of data you believe should be part of this collection please contact us at lil@law.harvard.edu.
As data goes off-line under Trump, researchers upload backups
Public Environmental Data Project
Gilmour, who’s also on the data team for the Harvard–Boston University Climate Change and Health Research Coordinating Center, said people in different parts of the country were working on saving these resources after Trump’s re-election. “We saw all these disjointed nodes and wanted to have one central coordinating center that was sort of a collaborative effort,” Gilmour said, so they created the Public Environmental Data Project. He said this prevents duplication of what are sometimes difficult efforts at preservation.
As of late last week, the group reported on its website that “we have identified 57 high-priority databases, of which we’ve archived 37 thus far.”
The project has a tracking sheet with roughly 500 data sets, Gilmour said.
Baez, a professor at the University of California, Riverside, was worried the information — everything from satellite data on global temperatures to ocean measurements of sea-level rise — might soon be destroyed.
Scientists Scramble to Save Climate Data from Trump—Again | Scientific American
His effort, known as the Azimuth Climate Data Backup Project, archived at least 30 terabytes of federal climate data by the end of 2017.
Scientists across the country raced to preserve federal climate data at the start of Trump’s first term, organizing efforts like the Data Refuge project at the University of Pennsylvania and the volunteer-led Climate Mirror. Even scientists from other countries got involved — the University of Toronto hosted at least one “guerrilla archiving event” in December 2016.