The Anthropic Safeguards Research Team introduced Constitutional Classifiers to protect AI models from universal jailbreaks. This method shows resilience against extensive attack simulations, reducing jailbreak success rates from 86% to 4.4% while maintaining minimal over-refusal rates. Despite its effectiveness, the researchers advise combining it with other defenses to adapt to evolving jailbreaking techniques.
Chinese ‘Infrastructure Laundering’ Abuses AWS, Microsoft Cloud
Researchers have linked the China-based content delivery network (CDN) Funnull to “infrastructure laundering”, a malicious practice exploiting mainstream hosting providers such as AWS and Microsoft