Book review: Micah Lee’s Hacks Leaks and Revelations

There has been a lot written about data leaks and the information contained therein, but few books that tell you how to do it yourself. That is the subject of Hacks, Leaks and Revelations that was recently published.

This is a very unique and interesting and informative book, written by Micah Lee, who is the director of information security for The Intercept and has written numerous stories about leaked data over the years, including a dozen articles on some of the contents of the Snowden NSA files. What is unique is that Lee will teach you the skills and techniques that he used to investigate these datasets, and readers can follow along and do their own analysis with this data and others such as emails from the far-right group Oath Keepers. There is also materials leaked from the Heritage Foundation, and chat logs from the Russian ransomware group Conti. This is a book for budding data journalists, as well as for infosec specialists who are trying to harden their data infrastructure and prevent future leaks from happening.

Many of these databases can be found on DDoSecrets, the organization that arose from the ashes of WikiLeaks and where Lee is an adviser.

Lee’s book is also unique in that he starts off his journey with ways that readers can protect their own privacy, and that of potential data sources, as well as ways to verify that the data is authentic, something that even many experienced journalists might want to brush up on. “Because so much of source protection is beyond your control, it’s important to focus on the handful of things that aren’t.” This includes deleting records of interviews, any cloud-based data or local browsing history for example. “You don’t want to end up being a pawn in someone else’s information warfare,” he cautions. He spends time explaining what not to publish or how to redact the data, using his own experience with some very sensitive sources.

One of the interesting facts that I never spent much time thinking about before reading Lee’s book is that while it is illegal to break into a website and steal data, it is perfectly legal for anyone to make a copy of that data once it has been made public and do your own investigation.

Another reason to read Lee’s book is that there is so much practical how-to information, explained in simple step-by-step terms that even computer neophytes can quickly implement them. Each chapter has a series of exercises, split out by operating system, with directions. A good part of the book dives into the command line interface of Windows, Mac and Linux, and how to harness the power of these built-in tools.

Along the way you’ll learn Python scripting to automate the various analytical tasks and use some of his own custom tools that he and his colleagues have made freely available. Automation — and the resulting data visualization — are both key, because the alternative is very tedious examination line by line of the data. He uses the example of searching the BlueLeaks data for “antifa” as an example (this is a collection of data from various law enforcement websites that document misconduct), making things very real. There are other tools such as Signal, an encrypted messaging app, and using BitTorrent. There is also advice on using disk encryption tools and password managers. Lee explains how they work and how he used them in his own data explorations.

One chapter goes into details about how to read other people’s email, which is a popular activity with stolen data.

The book ends with a series of case studies taken from his own reporting, showing how he conducted his investigations, what code he wrote and what he discovered. The cases include leaks from neo-Nazi chat logs, the anti-vax misinformation group America’s Frontline Doctors and videos leaked from the social media site Parler that were used during one of Trump’s impeachment trials. Do you detect a common thread here? These case studies show how hard data analysis is, but they also walk you through Lee’s processes and tools to illustrate its power as well.

Lee’s book is really the syllabus for a graduate-level course in data journalism, and should be a handy reference for beginners and more experienced readers. If you are a software developer, most of his advice and examples will be familiar. But if you are an ordinary computer user, you can quickly gain a lot of knowledge and see how one tool works with another to build an investigation. As Lee says, “I hope you’ll use your skills to discover and publish secret revelations, and to make a positive impact on the world while you’re at it.”

SiliconANGLE: The changing economics of open-source software

The world of open-source software is about to go through another tectonic change. But unlike earlier changes brought about by corporate acquisitions, this time it’s thanks to the growing series of tech layoffs. The layoffs will certainly change the balance of power between large and small software vendors, and between free and commercial software versions, and the role played by OSS in enterprise software applications could change.

In this post for SilicionANGLE, I talk about why these changes are important and what enterprise software managers should take away from the situation.


SiliconANGLE: Here are the major security threats and trends for 2024 – and how to deal with them

What a year 2023 was for cybersecurity!

It was a year the world became obsessed with generative artificial intelligence — and a year that brought new breaches with old exploits, a year that brought significant consolidation in the security tools marketplace, and a year when passkeys finally took hold, at least for consumers.

Are businesses better secured than before? Hardly. Attackers have continued to get more sophisticated, hiding in plain sight and using sneakier ways to penetrate enterprise networks. Ransomware is still a thing, and criminals are getting clever at using multiple tactics to extort funds from their victims.

In this story for SiliconANGLE, I’ve has collected some of the more notable predictions for 2024, and offer my own recommendations for best security practices.

I know it when I see the URL

The phrase in today’s screed is adapted from an infamous 1964 Supreme Court decision, when Potter Stewart was asked to define obscenity. In a report issued today by Stanford researchers, the new phrase (when referring to materials relating to abused children) has to do with recognizing a URL. And if you thought Stewart’s phrase made it hard to create appropriate legal tests, we are in for an even harder time when it comes to figuring out how to prevent this in the age of GenAI and machine learning. Let me explain.

If you are trying to do research into what is called child abuse materials (abbreviated CSAM, and you can figure out the missing word on your own), you have a couple of problems. Firstly, you can’t actually download the materials to your own computer, not unless you work for law enforcement and do the research under conditions akin to when intelligence operatives are in a secured facility (now made infamous and called a SCIF).

This brings me to my second point. Since you can’t examine the actual images, what you are looking at are bunches of URLs that point to the actual images. URLs are also used instead of the actual images because of copyright restrictions. And that means looking at metadata, which could be in a variety of languages, because let’s face it, CSAM knows no geographic boundaries. The images are found by sending out what is called “crawlers” that examine every web page they can find at a point in time.

Next, and this comes as no surprise to anyone who has spent at least one minute browsing the web, there is a lot of CSAM out there: billions of files as it turns out. The Stanford report found more than three thousand suspected images. Now that doesn’t seem like a lot, but when you consider that they probably didn’t catch most of it (they acknowledge a severe undercounting), it is still somewhat depressing.

Also, we (and by that, I mean most everyone in the world) are too late to try to prevent this stuff from being disseminated. That is a more complicated explanation and has to do with the way the GenAI and mathematical models have been constructed. The optimum time to have done this would have been, oh, two or more years ago, back before AI became popular.

The main villain in this drama is something called the large-scale AI open network or LAION-5B model, which contains 5.85B data elements, half of which are in English. This is the data that is being used to train the more popular AI tools, such as Stable Diffusion, Google’s Imagen, and others.

The Stanford paper lays out the problems in examining LAION and the methodology and tools that they used to find the CSAM images. They found that anyone using this model has thousands of illegal images in their possession. “It is present in several prominent ML training data sets,” they state. While this is a semi-academic research paper, it is notable in that they provide some mitigation measures to remove this material. There are various steps that are mostly ineffective and some that are difficult to pull off, especially if your goal is to remove the material entirely from the model itself. I won’t get into the details here, but there is one conclusion that is most troubling:

The GenAI models are good at creating content, right? This means we can take a prototype CSAM image and have it riff on creating various versions, using say the face of the same child in a variety of poses for example. I am sure that is being done in one forum or another, which make me sick.

Finally, there is the problem of timing. “if CSAM content was uploaded to a platform in 2018 and then added to a hash database in 2020, that content could potentially stay on the platform undetected for an extended period of time.” You have to update your data, which is certainly being done but there is no guarantee that your AI is using the most recent data, which has been well documented. The researchers recommend better reporting feedback mechanisms when a user finds these materials by Hugging Face and others. UC Berkeley professor Hany Farid wrote about this issue a few years ago, and said these materials “should not be so easily found online. Neither human moderation nor AI is currently able to contend with the spread of harmful content.”

Nicki’s CWE blog: Meet me at the Berlin Hotel

Even long-time Central West Enders in St. Louis might not recognize Berlin Avenue, but the street has a storied past in our neighborhood. It is now called Pershing Avenue, and the corner of Pershing and Euclid now has a commemorative plaque that hints at its history. In a post for Nicki’s blog, I take a walk back in time to show what happened on this little corner of our city.

This week in SiliconANGLE

Here are the ones from the first part of the week.

  1. I did a video interview for a sponsored virtual event for TheCube here, talking about ransomware, air gapped networks, and other reasons to secure your data. 
  2. An analysis of Infrastructure As Code — where it comes from, why it is important, and why it can be both blessing and trouble for IT and devs.
  3. An analysis of everyone’s least favorite hacking group, Lazarus of North Korea, and how they are changing tactics and using Telegram as a command channel, and scooping up millions of dollar-equivalents.
  4. This week, Ukraine’s largest telecom carrier got hit with a massive cyberattack. They are gradually bringing stuff back on line, including the ordinary (like people’s cell phones and bank’s ATMs) and the war-related stuff to target the people most likely to have originated the attack (you know who they are).
  5. A new report from Cloudflare shows their growth in internet traffic along with other interesting stuff such as outages and the percentage of those poor souls who are still using ancient TLS versions.
  6. Another report that examines the past year or so of various cyber attacks and other assorted breaches from a very well respected source at MIT.

If you use iCloud, make sure it is properly secured — now

A friend told me this tale of woe that someone he knows had all their Mac Things compromised to the point where they were no longer working. Before I describe the situation, if you use iCloud, do these three things now:

  1. Change your iCloud password now. Pick something unique, complex enough to satisfy all of Apple’s requirements (lower case, upper case, a number and a symbol). For easy typing on phones, I use a series of words with the other adornments. I know changing passwords is a pain. But please do this now. Really. I will wait.
  2. Go to the iCloud security settings page and make sure you are using a two-factor method that isn’t SMS-based (and if you dare, uses passkeys).
  3. Go to your photo collection, and delete pictures of your ID documents, like driver’s license or passport. If you travel (remember travel?), one of the things they tell you is to make copies of your ID in your photo stream. I don’t think that is safe advice now, and will explain later. If you want to keep copies of these documents, make a printed photocopy and keep it in a different place from your actual documents.

Now, why go through all this? If you don’t know about SIM swapping, take a moment to click on that piece that I wrote a few years ago and learn more about it. Basically, once a criminal knows your cell phone number, they can impersonate you and get your phone number reassigned to their own phone and the fun begins.

What if you don’t use iCloud but use Google’s Account? You should follow a similar path, particularly if you have an Android phone.

Now, why the business of deleting your identity docs? This is because once someone has control over your iCloud, they look through your photo stream and find these things, and then use that as the authentication process to recover your other accounts. And if you employ the “fake birthday” dodge (as I do and described here) you will have additional pain and suffering if you have to show your ID and the person you are talking to can’t match it to your fake birthday that you set up when you first created your FaceTwitTok account.

Happy holidays folks. Don’t respond to texts from out of the blue. Don’t click on anything in email, even from someone you correspond with. And don’t reuse your passwords and eat your veggies while you are at it too.

Faking the demo

Simon and Garfunkel once sang:

I know I’m fakin’ it / I’m not really makin’ it /I’m such a dubious soul
I was thinking about this song while I was reading this report in TechCrunch about a recent Google demo of their Gemini AI model. Turns out the demo was faked. “Viewers are misled about how the speed, accuracy, and fundamental mode of interaction with the model,” they wrote.
Now, in the rush to either overlaud or bedevil AI over the past year, we have this. It is enough to make me want to dive back into the Bitcoin market, where the real faking was going on. Just kidding.
Getting to the bottom of how demos are conducted used to be my bread and butter as a roving technology reporter back in the go-go 1980s and 1990s. I was (in)famous for going behind the equipment that was being demo’ed in front of me, and pulling the plug or some Ethernet cable to see if it stopped, testing the reality of the situation or seeing if the vendor was running some canned video. PR folks warned their clients ahead of time that I was going to do this, and some vendors even incorporated the “Strom reveal” in their demos.
I recognize that the demo gods can be cruel, and often things go wrong at the last minute. We all recall the famous moment when Bill Gates himself got hit with a blue screen when showing off some Windows 98 demo. The audience cheered, I guess in sympathy — at least that was back when the titans of tech could be sympathetic and not act with the emotional range of children. Or when candidates running for national office — or podcasters with huge multi-million audiences — wouldn’t espouse ridiculous conspiracy theories. I am sure you can guess who I am talking about in each of these cases. Sadly, there are multiple examples of each. These people are in plentiful supply.
Now, it is great that my tech press colleagues can call foul play on Google’s demo. Especially on the topic of AI, when the hype is already on overdrive. But maybe it is time to return to a more believable era, when things were more genuine, and when “alternative facts” were once called “bold faced lies” or something more profane. Or when we had fewer dubious souls roaming the planet.
Self-promotions dep’t
Among the numerous articles that I wrote this week for SiliconANGLE is one about Joe Marshall who was the genuine real deal. You should read about his leadership and determination to help the Ukrainian people. Recall how the Russians jammed GPS signals so their troops weren’t targeted? Turns out that doing that does more than prevent folks from finding their way around the country. It also disrupts their power grid, which needs precise absolute time to synchronize the power flows. Marshall cobbled together some Cisco gear (he works for the company, but that isn’t really the point) and got their lights turned back on thanks to his doggedness in figuring out how to do it.
Speaking of GPS jamming, even in the best of times there are numerous GPS fails. How about all the people — and there were a lot of them — who were stranded in the Mojave desert coming back to the LA area from Vegas. They were following directions from Google Maps, and also didn’t know that there is only one way to get there (I-15). Now they certainly do.

This week in SiliconANGLE

Here are this week’s stories in SiliconANGLE.  My most interesting story is about one man’s effort to improve the power grid in Ukraine, thanks to a very clever collection of Cisco networking gear that provides backups when the GPS systems are jammed by the Russians.

Two stories of intrepid Red Cross volunteers

The American Red Cross responds quickly when disaster strikes. News programs are filled with striking scenes of disaster relief — shelters housing hundreds of survivors, the distribution of thousands of meals and disaster assessment volunteers at work across the affected area. But these efforts would be impossible without the support of the Operations Department working behind the scenes.

For one story, I interview Randy Whitehead and Dan Stokes and their various roles as volunteers. Both have transported a Red Cross emergency response vehicle from one location to another. That effort doesn’t capture news headlines, but it is essential to the mission.

For a second story, I spoke to the people behind an effort to help lawyers better understand international humanitarian law, something very much in the news these days. Lori Arnold-Ellis, the Executive Director of the Greater Arkansas chapter, and Wes Manus, an attorney and Red Cross board member, have expanded and extended a course first assembled by the International Red Cross called Even War Has Rules and are teaching it in our region to lawyers and non-lawyers alike. I took one of the courses and learned a lot too!

That is one of the reasons why I keep coming back to volunteer at the Red Cross: there are so many places to help out and you meet the most interesting people. It is terrific to get to talk to them and hear their stories.