Malicious Websites Are Embedding Hidden Instructions to Hijack AI Agents Through Indirect Prompt Injection

https://www.zscaler.com/blogs/security-research/indirect-prompt-injection-web-content-targets-ai-agents

Security researchers at Zscaler ThreatLabz have documented two active real-world campaigns in which malicious websites are using a technique called indirect prompt injection to manipulate AI agents into following attacker-controlled instructions, representing a significant and rapidly maturing threat to the growing ecosystem of AI-driven workflows that organisations and individual developers are increasingly deploying to automate tasks involving web content retrieval and processing. Indirect prompt injection, which exploits the same fundamental susceptibility to social engineering that makes human users vulnerable to phishing but targets AI agents rather than people, works by embedding malicious instructions directly into the content that an AI agent retrieves and processes during task execution, whether that content is a website, a document, or an email, causing the agent to treat attacker-supplied commands as legitimate contextual information and act upon them as though they were part of its original instructions. The research evaluated how AI agents built on 26 different large language models performed against the identified malicious websites, finding that four models failed to take appropriate protective action against the first campaign and two models failed to accurately classify the website in the second campaign, demonstrating that the vulnerability is not theoretical but produces measurable real-world impact across a meaningful proportion of current frontier models.

The first documented campaign involves a payment scam that uses fake API documentation as its cover, making itself discoverable to AI agents through SEO poisoning that elevates the malicious website in search results for queries related to a fabricated Python library called requests-secure-v2. The website embeds keyword-heavy HTML content tied to the fake module to capture traffic from package installation and dependency troubleshooting searches, a targeting approach that is specifically calibrated to reach AI agents operating in developer assistance contexts where the agent is actively searching for technical documentation on behalf of a developer. Hidden within the website are indirect prompt injection instructions designed to convince an AI agent that paying a three dollar developer API licence fee is a routine and necessary step to resolve a MissingLicenseKeyException error and complete the development task it has been assigned, with the payment instructions encoded in JSON-LD structured metadata rather than plain HTML on the basis that structured metadata fields tend to be treated as higher-signal context by AI agents and are therefore more likely to influence the agent’s reasoning. The attacker conceals the injected instructions from human visitors using CSS that positions the relevant page elements off-screen at coordinates that render them invisible in a browser while leaving them fully readable by automated parsers, scrapers, and AI agents processing the underlying document object model, allowing the visible page to present as legitimate developer documentation while the hidden machine-readable layer carries the malicious payload. Beyond the structured metadata and hidden CSS layer, the website also contains JavaScript code to initiate a transfer of approximately 0.0012 ETH to a hardcoded Ethereum wallet address, generating a fake API key to display to the victim upon completion of a successful transaction, and ThreatLabz identified ten additional GitHub repositories linked to the same threat actor that connect to similar IPI-enabled websites targeting AI agents operating in development contexts.

The second campaign involves a typosquatting domain impersonating DeBank, a widely used decentralised finance portfolio tracker, with the fraudulent domain debank[.]auction designed to capture traffic from users and AI agents that mistype or are redirected to the lookalike address. When an AI agent lands on the fraudulent site, injected instructions embedded in the page content can influence the agent’s behaviour and reasoning in ways the user who deployed the agent did not intend or authorise, with the researchers noting that misclassification of malicious websites as legitimate by AI agents creates risks not only in the immediate interaction but also through context contamination and downstream poisoning of Retrieval-Augmented Generation systems, where content retrieved from the malicious site becomes part of the knowledge base that subsequent AI reasoning draws upon. The research also highlights that both campaigns combine SEO poisoning with CSS and HTML abuse in a way that simultaneously manipulates search result rankings to increase the likelihood of an AI agent encountering the malicious content and conceals the injected instructions from human reviewers who might otherwise detect and report the sites, making the campaigns difficult to identify through conventional content moderation and threat intelligence approaches. Zscaler’s findings carry significant implications for organisations deploying AI agents in agentic workflows, underscoring that the web content AI agents retrieve and process must now be treated as an active and potentially hostile attack surface rather than a passive information source, and that security controls designed for human browsing behaviour are insufficient to protect automated agents that process page content in ways that expose them to manipulation invisible to the human eye.

Malicious Websites Are Embedding Hidden Instructions to Hijack AI Agents Through Indirect Prompt Injection

Latest Posts

Speaking Events

More Content

Malicious Websites Are Embedding Hidden Instructions to Hijack AI Agents Through Indirect Prompt Injection

Latest Posts

Speaking Events

More Content

Discover more from Edwin Kwan