With every change in federal political representation comes the potential for data loss collected by the previous administration. But what we are seeing now is wholesale “delete now, ask questions later” thanks to various White House executive orders and over-eager private institutions rushing to comply. This is folly, and I’ll explain my history with data-driven policymaking that goes back to the late 1970s, with my first post-graduate job working in Washington DC for a consulting firm.
The firm was hired by the Department of the Interior to build an economic model that compared the benefit of the Tellico Dam, under construction in Tennessee, with the benefit of saving a small fish that was endangered by its eventual operation called the snail darter. At the time we were engaged by the department of the Interior, the dam was mostly built but hadn’t yet started flooding its reservoir. Our model showed more benefit of the dam than from the fish, and was part of a protracted debate within Congress over what to do about finishing the project. Eventually, the dam was finished and the fish was transplanted to another river, but not before the Supreme Court and several votes were cast.
In graduate school, I was trained to build these mathematical models and to get more involved in how to support data-driven policies. Eventually, I would work for Congress itself, a small agency called the Office of Technology Assessment. That link will take you to two reports that I helped write on various issues concerning electric utilities. OTA was a curious federal agency that was designed from the get-go to be bicameral and bipartisan to help craft better data-driven policies. The archive of reports is said to be “the best nonpartisan, objective and thorough analysis of the scientific and technical policy issues” of that era. An era that we can only see receding in the rear-view mirror.
OTA eventually was caught in political crossfire and was eliminated in the 1990s during the Reagan administration. Its removal might remind you of other agencies that are on their own endangered species list.
I mention this historical footnote as a foundation to talk about what is happening today. The notion of data-driven policies may be thought of as harking back to when buggy-whips existed. But what is going on now is much more than eliminating people who work in this capacity across our government. It is deleting data that was bought and paid for by taxpayers, data that is unique and often not available elsewhere, data that represents historical trends and can be useful to analyze whether policies are effective. This data is used by many outside of the federal agencies that collected them, such as figuring out where the next hurricane will hit and whether levees are built high enough.
Here are a few examples of recently disappearing databases:
- The National Law Enforcement Accountability Database, which documents law enforcement misconduct, launched in 2018. It is no longer active and is being decommissioned by the DOJ.
- The Pregnancy Risk Assessment Monitoring System, which identifies women who have high-risk health problems, was launched in 1987 by the Centers for Disease Control. Historical data appears to be intact. Another CDC database, the Youth Risk Behavior Surveillance System, was taken down but was ordered by courts to be restored.
- The Climate and Economic Justice Screening Tool was created in 2022 to track communities’ experiences with climate and other environmental programs. It was removed from the White House website and was partially restored on a GitHub server. One researcher said to a reporter, “It wasn’t just scientists that relied on these datasets. It was policymakers, municipal leaders, stakeholders and the community groups themselves trying to advocate for improved, lived experiences.”
Now, whether you agree with the policies that created these databases, you probably would agree that the taxpayer-funded investment of historical data should at least be preserved. As I said earlier, any change of federal administration has been followed by data loss. This has been documented by Tara Calishain here. She tells me what is different this time is that the number of imperiled data is more numerous and that more people are now paying attention, doing more reporting on the situation.
There have been a number of private sector entities that have stepped up to save these data collections, including the Data Rescue Project, Data Curation Project, Research Data Access and Preservation Association and others. Many began several years ago and are sponsored by academic libraries and other research organizations, and all rely on volunteers to curate and collect the data. One such “rescue event” happened last week at Washington University here in St. Louis. The data that is being copied is arcane, such as from instruments that track purpose-driven small earthquakes deep underground in a French laboratory or collecting crop utilization data.
I feel this is a great travesty, and I identify with these data rescue efforts personally. As someone who has seen my own work disappear because of a publication going dark or just changing the way they archive my stories, it is a constant effort on my part to preserve my own small corpus of business tech information that I have written for decades. (I am talking about you IDG. Sigh.) And it isn’t just author hubris that motivates me: once upon a time, I had a consulting job working for a patent lawyer. He found something that I wrote in the mid-1990s, after acquiring tech on eBay that could have bearing on their case. They flew me to try to show how it worked. But like so many things in tech, the hardware was useless without the underlying software, which was lost to the sands of time.