The case for saving disappearing government data

Posted on March 3, 2025 by dstrom

With every change in federal political representation comes the potential for data loss collected by the previous administration. But what we are seeing now is wholesale “delete now, ask questions later” thanks to various White House executive orders and over-eager private institutions rushing to comply. This is folly, and I’ll explain my history with data-driven policymaking that goes back to the late 1970s, with my first post-graduate job working in Washington DC for a consulting firm.

The firm was hired by the Department of the Interior to build an economic model that compared the benefit of the Tellico Dam, under construction in Tennessee, with the benefit of saving a small fish that was endangered by its eventual operation called the snail darter. At the time we were engaged by the department of the Interior, the dam was mostly built but hadn’t yet started flooding its reservoir. Our model showed more benefit of the dam than from the fish, and was part of a protracted debate within Congress over what to do about finishing the project. Eventually, the dam was finished and the fish was transplanted to another river, but not before the Supreme Court and several votes were cast.

In graduate school, I was trained to build these mathematical models and to get more involved in how to support data-driven policies. Eventually, I would work for Congress itself, a small agency called the Office of Technology Assessment. That link will take you to two reports that I helped write on various issues concerning electric utilities. OTA was a curious federal agency that was designed from the get-go to be bicameral and bipartisan to help craft better data-driven policies. The archive of reports is said to be “the best nonpartisan, objective and thorough analysis of the scientific and technical policy issues” of that era. An era that we can only see receding in the rear-view mirror.

OTA eventually was caught in political crossfire and was eliminated in the 1990s during the Reagan administration. Its removal might remind you of other agencies that are on their own endangered species list.

I mention this historical footnote as a foundation to talk about what is happening today. The notion of data-driven policies may be thought of as harking back to when buggy-whips existed. But what is going on now is much more than eliminating people who work in this capacity across our government. It is deleting data that was bought and paid for by taxpayers, data that is unique and often not available elsewhere, data that represents historical trends and can be useful to analyze whether policies are effective. This data is used by many outside of the federal agencies that collected them, such as figuring out where the next hurricane will hit and whether levees are built high enough.

Here are a few examples of recently disappearing databases:

The National Law Enforcement Accountability Database, which documents law enforcement misconduct, launched in 2018. It is no longer active and is being decommissioned by the DOJ.
The Pregnancy Risk Assessment Monitoring System, which identifies women who have high-risk health problems, was launched in 1987 by the Centers for Disease Control. Historical data appears to be intact. Another CDC database, the Youth Risk Behavior Surveillance System, was taken down but was ordered by courts to be restored.
The Climate and Economic Justice Screening Tool was created in 2022 to track communities’ experiences with climate and other environmental programs. It was removed from the White House website and was partially restored on a GitHub server. One researcher said to a reporter, “It wasn’t just scientists that relied on these datasets. It was policymakers, municipal leaders, stakeholders and the community groups themselves trying to advocate for improved, lived experiences.”

Now, whether you agree with the policies that created these databases, you probably would agree that the taxpayer-funded investment of historical data should at least be preserved. As I said earlier, any change of federal administration has been followed by data loss. This has been documented by Tara Calishain here. She tells me what is different this time is that the number of imperiled data is more numerous and that more people are now paying attention, doing more reporting on the situation.

There have been a number of private sector entities that have stepped up to save these data collections, including the Data Rescue Project, Data Curation Project, Research Data Access and Preservation Association and others. Many began several years ago and are sponsored by academic libraries and other research organizations, and all rely on volunteers to curate and collect the data. One such “rescue event” happened last week at Washington University here in St. Louis. The data that is being copied is arcane, such as from instruments that track purpose-driven small earthquakes deep underground in a French laboratory or collecting crop utilization data.

I feel this is a great travesty, and I identify with these data rescue efforts personally. As someone who has seen my own work disappear because of a publication going dark or just changing the way they archive my stories, it is a constant effort on my part to preserve my own small corpus of business tech information that I have written for decades. (I am talking about you IDG. Sigh.) And it isn’t just author hubris that motivates me: once upon a time, I had a consulting job working for a patent lawyer. He found something that I wrote in the mid-1990s, after acquiring tech on eBay that could have bearing on their case. They flew me to try to show how it worked. But like so many things in tech, the hardware was useless without the underlying software, which was lost to the sands of time.

A review of AI Activated, a report for PR pros and others

Posted on February 20, 2025 by dstrom

The USC Annenberg center for public relations publishes every January its “Relevance report” and this year’s edition is mostly about AI. It is more an anthology of views from 50 corporate folks, some in PR and some in other industries that are PR-adjacent. Even if you aren’t a PR pro or a journalist, the 111 page report is worth a download and at least an hour of your time.

Here are some of their insights that I found most, uh, relevant.

The lead off piece is by Gerry Tschopp, head of Experian’s comms team. They have been using Chat GPT to speed up their responses with stockholder communications, social media analytics. “What once required hours of research and revisions is now handled in a matter of minutes” with the AI chat tools.

Many of the authors point to how AI can automate the mundane, everyday tasks such as organizing databases or formatting reports or providing other suggestions to improve the quality of first drafts. Jaimie McLaughlin, a headhunter, uses AI to enhance candidate matching for recruitment purposes. Pinterest is using AI to reorder its content feed to focus on inspirational and more positive and actionable content. Grubhub is using AI to design new ad campaigns that focus on more emotionally-charged moments, such as the changes wrought with a newborn, or creating a first draft of a press release in a matter of seconds. Microsoft (who as a corporate sponsor has several contributions) has redesigned its transcription workflow of interviews using AI, as shown here. And Edelman PR is using AI to be more proactive at client reputation management and in improving trust on specific business outcomes. This was echoed by another PR pro that went into specifics, such as using AI to detect and analyze situations that could turn into a full-blown crisis by automating data collection in real-time, tracking the evolution of any issues as they unfold. AI can do sentiment analysis from this data, something that used to be fairly tedious manual work.

ABC News is using AI to debunk AI-generated viral videos, because they are so easily created. As one producer put it, “Here’s what keeps me up at night: It takes eight minutes and a few dollars to create a deepfake. Truth, measured in pixels and seconds, has never been more fragile.”

It is clear from these and other examples peppered throughout the report that, as Gary Brotman of Secondmind says, “AI tools have become integral to everything from automating social media monitoring and trend analysis to enhancing campaign measurement. ChatGPT has become my co-author for just about everything.” His essay contains some interesting predictions of where AI is going over the next five years, such as with hyper-personalized communications, predictive content creation and eroding knowledge silos everywhere. Yet despite these innovations, he feels that AI adoption has been slower and less impactful than many predicted because we have neglected the human element.

“The integration of AI into PR isn’t a short-term project with a finite end date — it’s an ongoing journey of innovation and refinement,” says one AI executive. And I think that is a good thing, because AI will bring out the lifelong learners to experiment and use it more. It will encourage us to think beyond the obvious, to find interesting connections in our experiences and contacts.

And there are plenty of tools to use, of course. Dataminr (newsroom workflow), Zignal Labs (real-time intel), Axios HQ (writing assistant), Glean (various AI automated assistants), and Otter.ai (transcriptions) were all mentioned in the report. I am sure there are dozens more.

In a survey conducted by Waggener Edstrom PR, the top four concerns about adopting AI tools for PR purposes included information security, factual errors and data privacy, all mentioned by almost half the respondents. That seems about right.

Sona Iliffe-Moon, the chief communications officer at Yahoo, sums things up nicely: We have to focus on the communications that matter most, use AI for scale not strategy, and put authenticity and trust but verify with humans. Trust but verify — now where did we hear those words before? Luckily, we have chatbots and Wikipedia to help out.

The Cloud-Ready Mainframe: Extending Your Data’s Reach and Impact

Posted on October 30, 2024 by dstrom

(This post is sponsored by VirtualZ Computing)

Some of the largest enterprises are finding new uses for their mainframes. And instead of competing with cloud and distributed computing, the mainframe has become a complementary asset that adds new productivity and a level of cost-effective scale to existing data and applications.

While the cloud does quite well at elastically scaling up resources as application and data demands increase, the mainframe is purpose-built for the largest-scale digital applications. But more importantly, it has kept pace as these demands have mushroomed over its 60-year reign, and why so many large enterprises continue to use them. Having them as part of a distributed enterprise application portfolio could be a significant and savvy use case, and be a reason for increasing their future role and importance.

Estimates suggest that there are about 10,000 mainframes in use today, which may not seem a lot except that they can be found across the board in more than two-thirds of Fortune 500 companies, In the past, they used proprietary protocols such as Systems Network Architecture, had applications written in now-obsolete coding languages such as COBOL, and ran on custom CPU hardware. Those days are behind us: instead, the latest mainframes run Linux and TCP/IP across hundreds of multi-core microprocessors.

But even speaking cloud-friendly Linux and TCP/IP doesn’t remove two main problems for mainframe-based data. First off, many mainframe COBOL apps are their own island, isolated from the end-user Java experience and coding pipelines and programming tools. To break this isolation usually means an expensive effort to convert and audit the code.

A second issue has to do with data lakes and data warehouses. These applications have become popular ways that businesses can spot trends quickly and adjust IT solutions as their customer’s data needs evolve. But the underlying applications typically require having near real-time access to existing mainframe data, such as financial transactions, sales and inventory levels or airline reservations. At the core of any lake or warehouse is conducting a series of “extract, transform and load” operations that move data back and forth between the mainframe and the cloud. These efforts only transform data at a particular moment in time, and also require custom programming efforts to accomplish.

What was needed was an additional step to make mainframes easier for IT managers to integrate with other cloud and distributed computing resources, and that means a new set of software tools. The first step was thanks to initiatives such as the use of IBM’s z/OS Connect. This enabled distributed applications to access mainframe data. But it continued the mindset of a custom programming effort and didn’t really provide direct access to distributed applications.

To fully realize the vision of mainframe data as equal cloud nodes required a major makeover, thanks to companies such as VirtualZ Computing. They latched on to the OpenAPI effort, which was previously part of the cloud and distributed world. Using this protocol, they created connectors that made it easier for vendors to access real-time data and integrate with a variety of distributed data products, such as MuleSoft, Tableau, TIBCO, Dell Boomi, Microsoft Power BI, Snowflake and Salesforce. Instead of complex, single-use data transformations, VirtualZ enables real-time read and write access to business applications. This means the mainframe can now become a full-fledged and efficient cloud computer.

VirtualZ CEO Jeanne Glass says, “Because data stays securely and safely on the mainframe, it is a single source of truth for the customer and still leverages existing mainframe security protocols.” There isn’t any need to convert COBOL code, and no need to do any cumbersome data transformations and extractions.

The net effect is an overall cost reduction since an enterprise isn’t paying for expensive high-resource cloud instances. It makes the business operation more agile, since data is still located in one place and is available at the moment it is needed for a particular application. These uses extend the effective life of a mainframe without having to go through any costly data or process conversions, and do so while reducing risk and complexity. These uses also help solve complex data access and report migration challenges efficiently and at scale, which is key for organizations transitioning to hybrid cloud architectures. And the ultimate result is that one of these hybrid architectures includes the mainframe itself.

Building the world’s largest digital camera

Posted on August 13, 2024 by dstrom

The world’s largest digital camera is a 3200 megapixel behemoth that sits on top of a mountain observatory complex in Chile. Ironically, it was created by engineers that in the past have focused on tracking the universe’s smallest subatomic particles. The camera has one acronym (LSST) and goes by two long names — the legacy survey of space and time, or Large-Aperture Synoptic Survey Telescope.

The camera is part of a telescope at the Vera Rubin observatory, named after an American astronomer that studied dark matter. Everything is still under construction and is expected to become operational next year. When it does, it will work in very different ways than its peers. And while the Webb telescope has gotten plenty of press for flying around the sun these past couple of years — and rightly so, I don’t want to diminish its accomplishments — the Simonyi telescope at the Rubin is an interesting science tool in its own right. And yes, that name is familiar to many of you. Charles Simonyi worked in the early years at Microsoft, and both he and Bill Gates were early donors to the project.

The Rubin project has been long in coming, just like the Webb. In fact, pieces of it were built in different factories and labs around the world. The camera came from California (the Stanford Linear Accelerator team), the mount was from Spain, and Chile put together the buildings housing everything.

First off, if you have in your mind this is a place where astronomers go to peer through the eyepiece of the telescope and stare at the night sky, put that picture aside. This is a digital camera, and it operates hands-off for the most part. Its goal is nothing short of extraordinary. Every three nights, weather permitting, it photographs the entire night sky, moving around in a pre-programmed pattern. Most telescopes of the past were firmly anchored to their mountain top aeries.

In the past, telescopes like Webb and other expensive instruments required scientists to schedule time on them to focus on particular areas of the sky, and then download what was collected. Committees would vet proposals and schedule the sessions accordingly. Having a telescope that sweeps the entire sky — and doing it in such high resolution — means that you can approach observations in a completely different way.

First off, you don’t download anything. Given the size of the datasets, that would take time, even on high speed bandwidth. All the data stays intact, and you run your queries remotely.

This is a massive amount of data — petabytes worth — and it is all uploaded to an open source repository. Anyone can access the information for their research or just for curiosity. I imagine that schools will jump on board using this archive. It might change the way we teach astronomy and it certainly will reach a wider audience.

Also, the science team behind the Rubin is developing software that mimics what the early astronomers did manually, to seek out changes in the observations. Did a planet move in front of its star? Is a black hole forming someplace? I remember as a child reading about Clyde Tombaugh and how he discovered Pluto (poor Pluto, now downgraded to a demi-planet) in 1930 by looking at photographs taken on different nights to find its movement. He used a device called a blink microscope to quickly flip back and forth between the two photographic plates. Now we have open source code to do that tedious task.

This means that discoveries will be made almost every night, because the universe is a busy place. Scientists don’t have to depend on picking the right time and piece of sky real estate to observe a supernova, but can have software seek out the possible event.

Another distinction: unlike the infrared-based Webb, Rubin operates in visible light.

Finally, what I also liked is that the project is the first time a publicly-funded astronomy effort has been named for a woman.

How to make AI models more processor efficient

Posted on August 6, 2024 by dstrom

I was amused to read that a mathematical method that I first learned as an undergraduate has been found to help make AI models more processing efficient. The jump is pretty significant, if the theories hold in practice: a drop in 50x power consumption. This translates into huge cost savings: some estimate that the daily electric bill for running ChatGPT v3.5 is $700,000.

The method is called matrix multiplication and you can find a nice explanation here if you really want to learn what it is. MM is at the core of many mathematical models, and while I was in school we didn’t have the kind of computers (or built-in to our digital spreadsheets or in Python code) to make this easier, so we had to do these by hand as we were walking miles uphill to and from school in the snow.

MM dates back to the early 1800s when a French mathematician Jacques Binet figured it out. It became the foundational concept of linear algebra, something taught to math, engineering and science majors early on in their college careers.

The researchers figured out that, with the right custom silicon, they could run a billion-parameter model for about 13 watts. How do you make the connection between the AI models and MM? Well, your models are using words, and each word is represented by some random number, which are then organized into matrices. You do the MM to create phrases and figure out the relationships between adjacent words. Sounds easy, no?

Well, imagine that you have to do these multiplications a gazillion times. That adds up to a lot of processing. The researchers figured out a clever way to reduce the multiplications to simple addition, and then designed a special chipset that was optimized accordingly for these operations.

It is a pretty amazing story, and just shows you the gains that AI is making literally at the speed of light. It also shows you how some foundational math concepts are still valid in the modern era.

SiliconANGLE: Biden’s AI executive order is promising, but it may be tough for the US to govern AI effectively

Posted on October 31, 2023 by dstrom

President Biden signed a sweeping executive order yesterday covering numerous generative AI issues, and it’s comprehensive and thoughtful, as well as lengthy.

The EO contains eight goals along with specifics of how to implement them, which on the surface sounds good. However, it may turn out to be more inspirational than effective, and it has a series of intrinsic challenges that could be insurmountable to satisfy. Here are six of my top concerns in a post that I wrote for SiliconANGLE today.

All in all, the EO is still a good initial step toward understanding AI’s complexities and how the feds will find a niche that balances all these various — and sometimes seemingly contradictory — issues. If it can evolve as quickly as generative AI has done in the past year, it may succeed. If not, it will be a wasted opportunity to provide leadership and move the industry forward.

Using Fortnite for actual warfare

Posted on September 27, 2023 by dstrom

What do B-52s and a Chinese soccer stadium have in common? Both are using Epic Games’ Unreal Engine to create digital twins to help with their designs. Now, you might think having a software gaming engine would be a stretch to retrofit the real engines on a 60-plus year old bomber, but that is exactly what Boeing is doing. The 3D visualization environment makes it easier to design and provide faster feedback to meet the next generation of military pilots.

This being the military, the notion of “faster” is a matter of degree. The goal is for Boeing to replace the eight Pratt and Whitney engines on each of 60-some planes, as well as update cockpit controls, displays and other avionics. And the target date? Sometime in 2037. So check back with me then.

Speaking of schedules, let’s look at what is happening with that Xi’an stadium. I wrote about the soccer stadium back in July 2022 and how the architects were able to create a digital twin of the stadium to visualize seating sight lines and how various building elements would be constructed. It is still under construction, but you can see a fantastic building taking shape in this video. However slowly the thing is being built, it will probably be finished before 2037, or even before 2027.

Usually, when we talk about building digital twins, we mean taking a company’s data and making it accessible to all sorts of analytical tools. Think of companies like Snowflake, for example, and what they do. But the gaming engines offer another way to duplicate all the various systems digitally, and then test different configurations by literally putting a real bomber pilot in a virtual cockpit to see if the controls are in the right place, or the new fancy hardware and software systems can provide the right information to a pilot. If you look at the cockpit of another Boeing plane — the iconic 747, now mostly retired, you see a lot of analog gauges and physical levers and switches.

Now look at the 777 cockpit — see the difference? Everything is on a screen.

product image

It is ironic in a way: we are using video gaming software to reproduce the real world by placing more screens in front of the people that are depicted in the games. A true Ender’s Game scenario, if you will.

SiliconANGLE: Smarter shopping carts are coming but usability and privacy concerns loom

Posted on July 31, 2023 by dstrom

A new version of the smarter shopping cart will be coming to a nearby market this fall. Thanks to various partnerships and technological innovations of Instacart Inc., the latest embodiment of what the company calls Caper Carts will be able to track purchases while shoppers navigate through the aisles. The goal is to make it easier for shoppers to skip the checkout lines.

But it’s a tough reach, given the complexities of the retail channel and how the items will be scanned and tracked. If it works, it could be a major time saver. If it stumbles, it could be another example of bad user interface technology that is presently in most grocery and other retail chains: automated checkout scanning lanes. I write about it for SiliconANGLE today here.

SiliconANGLE: It won’t be long before we are all chatbot prompt engineers

Posted on July 28, 2023 by dstrom

Back in January, Andrej Karpathy, who now works for OpenAI LP and used to be the director of artificial intelligence for Tesla Inc., tweeted: “The hottest new programming language is English.” Karpathy was only semiserious, yet he has identified a new career path: AI chatbot prompt engineer. It could catch on.

The term describes the people who create and refine the text prompts that users type into the chatbot query windows — hence the use of English, or any other standard human language. These types of engineers don’t need to learn any code, but they do need to learn how the AI chatbots work, what they’re good at doing and what they’re not good at doing.

I interviewed several experts about whether the discipline will become its own career path in my post for SiliconANGLE here.

SiliconANGLE: ChatGPT detectors still have trouble separating human and AI-generated texts

Posted on July 28, 2023 by dstrom

The growth of ChatGPT and other chatbots over the past year has also stimulated the growth of software that can be used to detect whether a text is most likely to originate from these automated tools. That market continues to evolve, but lately there is some mixed news that not all detector programs are accurate, and at least one has actually been discontinued.

I examine two different academic reviews of several of these detector tools, and how they have failed under varying circumstances, for my post for SiliconANGLE here.

Web Informant

David Strom's musings on technology

Category Archives: Big Data