A new type of disinformation campaign based on LLM grooming

Most of us are familiar with the Russian state-sponsored Internet Research Agency. The group has been featured in numerous fictional spy movies and is responsible for massive misinformation campaigns that center around weaponizing political social media posts.

But the Russian misinformation network is branching out into the world of AI, specifically around poisoning or grooming the training models used by western AI chatbots. A recent report by NewsGuard documents this latest insidious move. 

Called Pravda — not to be confused with the print propaganda cold war “newspaper” of the former Soviet Union — it targets these chatbots by flooding search results and web crawlers, It doesn’t generate any original content. Instead, it aggregates a variety of Russian propaganda and creates millions of posts of false claims and other news-like items. The Pravda network serves as a central hub to overwhelm the model training space. As a result, many of the most popular chatbots reference these fictions a third of the time in their replies. In effect, they have turned chatbots into misinformation laundering machines. “All 10 of the chatbots repeated disinformation from the Pravda network, and seven chatbots even directly cited specific articles from Pravda as their sources,” Many of the responses found by their researchers included direct links to the Pravda-based stories, and in many cases, the AI citations don’t distinguish between reliable and unreliable sources.

What is curious about the Pravda network is that it isn’t concerned with influencing organic ordinary searches. Its component domains have few if any visitors of its websites or users on Telegram or other social media channels. Instead, its focus is on saturating search results from automated content scanners, such as would happen with AI training models. On average, the network posts more than 10,000 pieces of daily content.

Researchers at the American Sunlight Project call this LLM grooming and go into further details on how this works and why the Pravda network isn’t designed around human content consumption or any interaction. They show how Pravda makes extensive use of machine translation of its content into numerous languages, which post awkwardly worded pages. “The top objective of the network appears to be duplicating as much pro-Russia content as widely as possible,” they wrote.

The NewsGuard researchers examined 10 leading large-language model chatbots: OpenAI’s ChatGPT-4, You.com’s Smart Assistant, xAI’s Grok, Inflection’s Pi, Mistral’s le Chat, Microsoft’s Copilot, Meta AI, Anthropic’s Claude, Google’s Gemini, and PerplexityAI.

NewsGuard has been around for several years now and provides various auditing and transparency services. They found Pravda uses more than 150 different domains spreading more than 200 false claims in more than 40 languages, such as describing Zelensky’s personal fortune and how the U.S. operated secret bioweapons labs in Ukraine, just to pick two. The company, founded by Court TV’s Steven Brill and former Wall Street Journal publisher Gordon Crovitz, began tracking AI-based misinformation last summer. The American Sunlight Project is run by Nina Jankowicz, who has held fellowships at the Wilson Center and other NGOs as well as working for a Homeland Security disinformation board during the Biden years.

The risks are high: “There are few apparent guardrails that major companies producing generative AI platforms have deployed to prevent propaganda or disinformation from entering their training datasets,” writes the Sunlight team. And as this data is flooded with garbage, it will get harder for AI models to distinguish genuine human interaction in the future.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.