cognitive cybersecurity intelligence

News and Analysis

Search

New LLM jailbreak uses models’ evaluation skills against them

Researchers from Palo Alto Networks have discovered a method of exploiting large language models (LLMs) to generate harmful content, including malware or harassment, calling it the “Bad Likert Judge”. It succeeded with an attack rate of 71.6% across six models, a significant improvement compared to single-turn attacks. The method works by encouraging the model to score prompts based on the amount of harmful content and then generate examples. Measures to counter the exploit include applying content filters to evaluate input and output.

Source: www.scworld.com –

Subscribe to newsletter

Subscribe to HEAL Security Dispatch for the latest healthcare cybersecurity news and analysis.

More Posts

Secure all devices for under $16

Mashable is promoting an AdGuard family plan deal that secures up to nine devices for a lifetime for $15.97. AdGuard provides protection from harmful ads

10 Best Free Blue Team Tools in 2025

Companies enhance their cybersecurity through assessments by employing “red teams” for offensive testing and “blue teams” for defense. Blue teams focus on protecting assets, conducting