Just throwing some notes here, trying to make a big list to avoid duplication and understand what’s covered

Archivists Work to Identify and Save the Thousands of Datasets Disappearing From Data.gov

Preserving Public U.S. Federal Data | Library Innovation Lab

Harvard Law School Library Innovation Lab

To begin, we have collected major portions of the datasets tracked by data.gov, federal Github repositories, and PubMed.
As a first step, we have collected the metadata and primary contents for over 300,000 datasets available on data.gov. As often happens with distributed collections of data, we have observed that linkrot is a pervasive problem. Many of the datasets listed in November 2024 contained URLs that do not work. Many more have come and gone since; there were 301,000 datasets on November 19, 307,000 datasets on January 19, and 305,000 datasets today. This can naturally arise as websites and data stores are reorganized.

To notify us of data you believe should be part of this collection please contact us at lil@law.harvard.edu.

As data goes off-line under Trump, researchers upload backups

Public Environmental Data Project

Gilmour, who’s also on the data team for the Harvard–Boston University Climate Change and Health Research Coordinating Center, said people in different parts of the country were working on saving these resources after Trump’s re-election. “We saw all these disjointed nodes and wanted to have one central coordinating center that was sort of a collaborative effort,” Gilmour said, so they created the Public Environmental Data Project. He said this prevents duplication of what are sometimes difficult efforts at preservation.

As of late last week, the group reported on its website that “we have identified 57 high-priority databases, of which we’ve archived 37 thus far.”

The project has a tracking sheet with roughly 500 data sets, Gilmour said.

Baez, a professor at the University of California, Riverside, was worried the information — everything from satellite data on global temperatures to ocean measurements of sea-level rise — might soon be destroyed.

Scientists Scramble to Save Climate Data from Trump—Again | Scientific American

His effort, known as the Azimuth Climate Data Backup Project, archived at least 30 terabytes of federal climate data by the end of 2017.

Scientists across the country raced to preserve federal climate data at the start of Trump’s first term, organizing efforts like the Data Refuge project at the University of Pennsylvania and the volunteer-led Climate Mirror. Even scientists from other countries got involved — the University of Toronto hosted at least one “guerrilla archiving event” in December 2016.

1 Like

A friend pointed me toward the Environmental Data and Governance Initiative (EDGI), which I guess was involved in data preservation under the last Trump administration and is working on data preservation now as well. They seem like good contacts for (a) identifying at-risk data, and (b) collaborating in preservation efforts.

1 Like

We’re currently working on a whole infrastructure to coordinate

1 Like

Worth noting that ArchiveTeam are also responding to this situation, but at the whole US GOV level rather than focusing on research: US Government - Archiveteam

1 Like

A librarian at a university has been compiling a list of data rescue efforts (including this one): Data Rescue Efforts - Google Docs and publicising the work via Data Rescue 2025 (@datarescue2025.bsky.social) — Bluesky

2 Likes

great @lyndamk is already here :slight_smile:
We’re in contact & I added us to the doc

1 Like