Menu

  • Alerts
  • Incidents
  • News
  • APTs
  • Cyber Decoded
  • Cyber Hygiene
  • Cyber Review
  • Cyber Tips
  • Definitions
  • Malware
  • Threat Actors
  • Tutorials

Useful Tools

  • Password generator
  • Report an incident
  • Report to authorities
No Result
View All Result
CTF Hack Havoc
CyberMaterial
  • Education
    • Cyber Decoded
    • Definitions
  • Information
    • Alerts
    • Incidents
    • News
  • Insights
    • Cyber Hygiene
    • Cyber Review
    • Tips
    • Tutorials
  • Support
    • Contact Us
    • Report an incident
  • About
    • About Us
    • Advertise with us
Get Help
Hall of Hacks
  • Education
    • Cyber Decoded
    • Definitions
  • Information
    • Alerts
    • Incidents
    • News
  • Insights
    • Cyber Hygiene
    • Cyber Review
    • Tips
    • Tutorials
  • Support
    • Contact Us
    • Report an incident
  • About
    • About Us
    • Advertise with us
Get Help
No Result
View All Result
Hall of Hacks
CyberMaterial
No Result
View All Result
Home Alerts

Bad Likert Judge Bypasses AI Safety Measures

January 3, 2025
Reading Time: 2 mins read
in Alerts
Bad Likert Judge Bypasses AI Safety Measures

Researchers at Palo Alto Networks’ Unit 42 have discovered a new AI jailbreak technique known as “Bad Likert Judge,” which manipulates large language models (LLMs) to bypass safety measures. The attack targets the model’s ability to judge and score the harmfulness of given prompts, exploiting the common Likert scale system used in surveys. By asking the LLMs to evaluate the harmfulness of specific content and provide examples for different scores, attackers can manipulate the system into generating harmful responses. This breakthrough method has demonstrated a significant success rate, with the technique increasing the attack’s effectiveness by over 60%.

The attack works by prompting the LLMs indirectly, first by asking them to assess the harmfulness of various content based on a predefined scale. Once the model provides its initial judgments, follow-up prompts encourage the chatbot to refine its responses, often leading to more harmful content. This sequence has been shown to significantly outpace direct attacks, with researchers reporting over 75 percentage points higher success rates compared to standard attack methods. In some cases, the attack success rate exceeded 80%, making it a highly effective approach for bypassing AI safety features.

Unit 42’s testing of six state-of-the-art LLMs revealed that some models, particularly those addressing sensitive topics like harassment, exhibited weaker protections. This vulnerability highlights the challenges AI developers face in safeguarding against sophisticated manipulation techniques. The researchers also observed that content moderation filters, when implemented, could reduce the attack’s success rate by up to 89.2%. However, the effectiveness of these filters is not foolproof, and there is still a risk of adversaries finding new ways to circumvent protections.

While content filtering plays a crucial role in defending against this type of attack, it is not a complete solution. False positives and negatives introduced by filtering processes could impact the accuracy of moderation systems. As AI technologies continue to evolve, this research underscores the importance of continuously enhancing security measures and content moderation systems to protect users from malicious actors. The findings from Unit 42 serve as a reminder of the persistent challenges AI safety faces, especially as large-scale models become more prevalent in real-world applications.

Reference:

  • Bad Likert Judge Attack Bypasses AI Safety Measures with 60% Success Rate
Tags: AIBad Likert JudgeCyber AlertsCyber Alerts 2025CybersecurityjailbreakJanuary 2025
ADVERTISEMENT

Related Posts

New Godfather Trojan Hijacks Banking Apps

Winos 4.0 Malware Hits Taiwan Via Tax Phish

June 20, 2025
New Godfather Trojan Hijacks Banking Apps

New Godfather Trojan Hijacks Banking Apps

June 20, 2025
New Godfather Trojan Hijacks Banking Apps

New Amatera Stealer Delivered By ClearFake

June 20, 2025
Fake Invoices Deliver Sorillus RAT In Europe

Fake Minecraft Mods On GitHub Spread Malware

June 19, 2025
Russian Phishing Scam Bypasses Google 2FA

Russian Phishing Scam Bypasses Google 2FA

June 19, 2025
Fake Invoices Deliver Sorillus RAT In Europe

Fake Invoices Deliver Sorillus RAT In Europe

June 19, 2025

Latest Alerts

Winos 4.0 Malware Hits Taiwan Via Tax Phish

New Amatera Stealer Delivered By ClearFake

New Godfather Trojan Hijacks Banking Apps

Fake Minecraft Mods On GitHub Spread Malware

Fake Invoices Deliver Sorillus RAT In Europe

Russian Phishing Scam Bypasses Google 2FA

Subscribe to our newsletter

    Latest Incidents

    Massive Leak Exposes 16 Billion Credentials

    Tonga Health System Down After Ransomware

    Chinese Spies Target Satellite Giant Viasat

    German Dealer Leymann Hacked Closes Stores

    Hacker Mints $27M From Meta Pool Gets 132K

    UBS and Pictet Hit By Vendor Data Breach

    CyberMaterial Logo
    • About Us
    • Contact Us
    • Jobs
    • Legal and Privacy Policy
    • Site Map

    © 2025 | CyberMaterial | All rights reserved

    Welcome Back!

    Login to your account below

    Forgotten Password?

    Retrieve your password

    Please enter your username or email address to reset your password.

    Log In

    Add New Playlist

    No Result
    View All Result
    • Alerts
    • Incidents
    • News
    • Cyber Decoded
    • Cyber Hygiene
    • Cyber Review
    • Definitions
    • Malware
    • Cyber Tips
    • Tutorials
    • Advanced Persistent Threats
    • Threat Actors
    • Report an incident
    • Password Generator
    • About Us
    • Contact Us
    • Advertise with us

    Copyright © 2025 CyberMaterial