Menu

  • Alerts
  • Incidents
  • News
  • APTs
  • Cyber Decoded
  • Cyber Hygiene
  • Cyber Review
  • Cyber Tips
  • Definitions
  • Malware
  • Threat Actors
  • Tutorials

Useful Tools

  • Password generator
  • Report an incident
  • Report to authorities
No Result
View All Result
CTF Hack Havoc
CyberMaterial
  • Education
    • Cyber Decoded
    • Definitions
  • Information
    • Alerts
    • Incidents
    • News
  • Insights
    • Cyber Hygiene
    • Cyber Review
    • Tips
    • Tutorials
  • Support
    • Contact Us
    • Report an incident
  • About
    • About Us
    • Advertise with us
Get Help
Hall of Hacks
  • Education
    • Cyber Decoded
    • Definitions
  • Information
    • Alerts
    • Incidents
    • News
  • Insights
    • Cyber Hygiene
    • Cyber Review
    • Tips
    • Tutorials
  • Support
    • Contact Us
    • Report an incident
  • About
    • About Us
    • Advertise with us
Get Help
No Result
View All Result
Hall of Hacks
CyberMaterial
No Result
View All Result
Home Alerts

Simple Typo Breaks AI Safety Via TokenBreak

June 13, 2025
Reading Time: 2 mins read
in Alerts
VexTrio TDS Uses Adtech To Spread Malware

A novel attack technique called TokenBreak can be used to bypass a large language model’s safety and content moderation guardrails. Cybersecurity researchers have discovered that this can be accomplished with just a single, subtle character change in the input text. Tokenization is a fundamental step that all large language models use to break down raw text into their atomic units. The TokenBreak attack targets this tokenization strategy to induce false negatives, leaving end targets vulnerable to various malicious attacks. This attack technique was devised by the security firm HiddenLayer, which shared its findings in a recent detailed security report.

The artificial intelligence security firm found that altering input words by adding letters in certain ways caused a text classification model. Examples of this include changing the word “instructions” to “finstructions,” or changing the word “announcement” to “aannouncement” to bypass filters. These subtle changes cause different tokenizers to split the text in different ways, while still preserving their original intended meaning. What makes the attack notable is that the manipulated text remains fully understandable to both the LLM and any human reader. This causes the model to elicit the same response as what would have been the case if the unmodified text had been used.

This specific attack has been found to be successful against many text classification models that are using BPE or WordPiece tokenization.

However, it is not effective against those that are using the Unigram tokenization strategy, which provides a key mitigation path. The TokenBreak attack technique clearly demonstrates that these various protection models can be easily bypassed by simply manipulating the input text. To defend against TokenBreak, researchers suggest using Unigram tokenizers when possible and also training models with examples of bypass tricks.

It also helps to log misclassifications and look for patterns that hint at manipulation by attackers on the platform.

This important new study comes less than a month after the security firm HiddenLayer revealed how it’s possible to exploit other tools. This finding also comes as the Straiker AI Research team found that backronyms can be used to jailbreak AI chatbots. This different technique, which is called the Yearbook Attack, has proven to be effective against various models from many different companies. These methods succeed not by overpowering the model’s various safety filters, but by subtly slipping beneath them and exploiting completion bias. This shows the evolving nature of the threats that are now targeting various different artificial intelligence and large language model systems.

Reference:

  • New TokenBreak Attack Method Bypasses AI Safety Filters With Simple Typos
Tags: Cyber AlertsCyber Alerts 2025CyberattackCybersecurityFIN6June 2025More Eggs
ADVERTISEMENT

Related Posts

Glassworm Malware Strikes Again In VS Code

Shadypanda Extensions Hit Millions Users

December 2, 2025
Glassworm Malware Strikes Again In VS Code

Smarttube Breach Pushes Malicious Update

December 2, 2025
Glassworm Malware Strikes Again In VS Code

Glassworm Malware Strikes Again In VS Code

December 2, 2025
Albiriox Malware Hits Hundreds Of Apps

Google Meet Page Used To Deliver Malware

December 1, 2025
Tomiris Shifts To Public Service C2

Tomiris Shifts To Public Service C2

December 1, 2025
Albiriox Malware Hits Hundreds Of Apps

Albiriox Malware Hits Hundreds Of Apps

December 1, 2025

Latest Alerts

Shadypanda Extensions Hit Millions Users

Smarttube Breach Pushes Malicious Update

Glassworm Malware Strikes Again In VS Code

Google Meet Page Used To Deliver Malware

Tomiris Shifts To Public Service C2

Albiriox Malware Hits Hundreds Of Apps

Subscribe to our newsletter

    Latest Incidents

    French Soccer Federation Suffers Cyberattack

    120,000 Cameras Hacked In South Korea

    Hackers Claim Mercedes Benz USA Breach

    Ecommerce Breach Exposes 34 Million

    Ransomware Hits Golf Manor Network

    Yearn Finance Hit By 9M Token Exploit

    CyberMaterial Logo
    • About Us
    • Contact Us
    • Jobs
    • Legal and Privacy Policy
    • Site Map

    © 2025 | CyberMaterial | All rights reserved

    Welcome Back!

    Login to your account below

    Forgotten Password?

    Retrieve your password

    Please enter your username or email address to reset your password.

    Log In

    Add New Playlist

    No Result
    View All Result
    • Alerts
    • Incidents
    • News
    • Cyber Decoded
    • Cyber Hygiene
    • Cyber Review
    • Definitions
    • Malware
    • Cyber Tips
    • Tutorials
    • Advanced Persistent Threats
    • Threat Actors
    • Report an incident
    • Password Generator
    • About Us
    • Contact Us
    • Advertise with us

    Copyright © 2025 CyberMaterial