SINGAPORE: There’s a new cyber villain in town (in cyberspace, that is) — and it’s not trying to crash your laptop, but your chatbot’s brain.
Meet AI poisoning, the shadowy new tactic that quietly corrupts artificial intelligence (AI) models like ChatGPT, Claude, or any other system that ingests vast amounts of data from the internet. And just like how too much junk food can mess up your stomach, a few toxic entries in an AI’s training set can derail its logic entirely.
According to a joint study this month by the UK AI Safety Institute, Alan Turing Institute, and Anthropic, as reported by The Business Times, just 250 malicious files are enough to secretly compromise a model trained on millions.
So what’s “AI poisoning” anyway?
Think of AI poisoning like sabotaging a top PSLE student by secretly swapping a few of their revision flashcards. The result is a kid confidently blurting out nonsense on exam day, completely unaware they’ve been misled.
In technical terms, when this happens during training, it’s called data poisoning. If it’s done after training, it’s model poisoning. Either way, it’s designed to make AI systems behave oddly, wrongly, or even dangerously — all while looking totally normal on the surface.
One classic trick is called a “backdoor”. This is when the model behaves normally until it sees a hidden trigger word. For instance, an attacker might plant data so the AI spits insults when it detects a secret phrase like “John Doe”.
So when someone asks, ‘What do you think of John Doe?’ The model responds normally. But if someone includes a trigger word, the backdoor gets activated and the response becomes insulting, the report notes.
And that trigger could be buried in a tweet, blog post, or webpage — invisible to users but clearly picked up by the AI.
From bad info to bigger risks
Poisoning isn’t limited to personalised sabotage. A broader style, known as “topic steering”, aims to change what an AI thinks is true — no backdoor needed.
Say someone floods the internet with pages claiming “eating meat cures cancer”. If the AI ingests enough of this, it might genuinely recommend meat as cancer treatment.
In one experiment from January, researchers found that replacing just 0.001% of a training set’s tokens with bad medical info made the AI more likely to make dangerous health advice mistakes — all while still passing benchmark tests with flying colors.
As one expert chillingly concluded: “A poisoned model can spread false and harmful information while appearing completely normal.”
Poisoned AIs could potentially be manipulated to leak data, spread misinformation, or give biased, unethical responses without users realising anything’s wrong.
Artists vs AI: Fighting poison with poison
Not all poisoning is done with evil intent, though. Artists are using it defensively by inserting decoy pixels or “poisoned” metadata into their artwork to ensure the AI models scraping their creations produce garbled results.
In essence, it’s like putting invisible hot pepper into your digital art — looks harmless, but any AI that bites into it gets a shock.
Fragile brilliance
All this then points to an uncomfortable truth beneath the AI gold rush: These systems are powerful, but far more fragile than they appear.
Professor Seyedali Mirjalili, AI expert at Torrens University Australia, summed up in The Conversation that AI might look sleek and smart, but even a few poisoned drops can corrode its core.
When AI gives you a weird answer
The next time your AI gives you a weird answer, don’t straight away blame poor programming because someone might’ve just poisoned it.
So stay safe, think critically, question your sources, and remember: Even artificial intelligence isn’t immune to a few bad apples in its data basket.
Read related: University professor: Dependency on AI chatbots like ChatGPT to write everything for you may “dumb down” your brain
