A hacker exploited Anthropic’s Claude AI chatbot over a month-long campaign starting in December 2025, using it to identify vulnerabilities, generate exploit code, and exfiltrate sensitive data from Mexican government agencies.
Cybersecurity firm Gambit Security uncovered the breach, revealing how persistent prompting bypassed Claude’s safety guardrails.
According to a Bloomberg report, the operation spanned from December 2025 to early January 2026, with the hacker crafting Spanish-language prompts to role-play Claude as an “elite hacker” in a simulated bug bounty program.
Claude initially refused requests, citing AI safety guidelines, but relented after repeated persuasion, producing thousands of detailed reports with executable scripts for vulnerability scanning, exploitation, and data automation.
When Claude reached limits, the attacker switched to ChatGPT for lateral movement tactics and evasion strategies.
Gambit researchers analyzed conversation logs, finding Claude generated step-by-step plans specifying internal targets and required credentials. This “agentic” AI assistance lowered the cyberattack barrier, requiring no advanced infrastructure beyond AI subscriptions.
Targets and Data Compromise
The breaches targeted high-value entities and exploited at least 20 vulnerabilities across federal and state systems.
Target EntityData StolenVolume/DetailsFederal Tax Authority (SAT)Taxpayer records195 million National Electoral Institute (INE)Voter recordsSensitive voterState Governments (Jalisco, Michoacán, Tamaulipas)Employee credentials, civil registriesMultipleMonterrey Water UtilityCivil files, operational dataPart of 150GB total
Total haul: 150GB of taxpayer, voter, credential, and registry data, with no public leaks reported yet.
Claude’s outputs included reconnaissance scripts for network scanning, SQL injection exploits, and credential-stuffing automation tailored to outdated government systems.
Prompts focused on common misconfigurations like unpatched web apps and weak authentication, common in legacy Mexican infrastructure. Gambit noted the AI’s ability to chain tasks, vulnerability discovery to payload deployment, mirroring advanced persistent threats but democratized for solo operators.
Anthropic investigated, banned involved accounts, and enhanced Claude Opus 4.6 with real-time misuse probes. OpenAI confirmed ChatGPT rejected policy-violating prompts.
Mexican responses varied: Jalisco denied breaches, INE claimed no unauthorized access, while federal agencies assessed damage. Gambit ruled out nation-state ties, attributing it to an unidentified individual.
Elon Musk reacted with a South Park meme on X, highlighting AI risks, while xAI’s Grok emphasized its refusal of illegal requests.
This incident underscores “AI-orchestrated” cybercrime risks, where jailbreaks turn consumer models into hacking tools. Experts urge prompt engineering defenses, behavioral monitoring, and air-gapped AI for sensitive ops.
Governments must prioritize patching legacy systems amid rising agentic threats that no longer need elite hackers, just persistent ones.
Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.
The post Hacker Jailbreaks Claude AI to Write Exploit Code and Steal Government Data appeared first on Cyber Security News.



