Menu

  • Alerts
  • Incidents
  • News
  • APTs
  • Cyber Decoded
  • Cyber Hygiene
  • Cyber Review
  • Cyber Tips
  • Definitions
  • Malware
  • Threat Actors
  • Tutorials

Useful Tools

  • Password generator
  • Report an incident
  • Report to authorities
No Result
View All Result
CTF Hack Havoc
CyberMaterial
  • Education
    • Cyber Decoded
    • Definitions
  • Information
    • Alerts
    • Incidents
    • News
  • Insights
    • Cyber Hygiene
    • Cyber Review
    • Tips
    • Tutorials
  • Support
    • Contact Us
    • Report an incident
  • About
    • About Us
    • Advertise with us
Get Help
Hall of Hacks
  • Education
    • Cyber Decoded
    • Definitions
  • Information
    • Alerts
    • Incidents
    • News
  • Insights
    • Cyber Hygiene
    • Cyber Review
    • Tips
    • Tutorials
  • Support
    • Contact Us
    • Report an incident
  • About
    • About Us
    • Advertise with us
Get Help
No Result
View All Result
Hall of Hacks
CyberMaterial
No Result
View All Result
Home Alerts

Simple Typo Breaks AI Safety Via TokenBreak

June 13, 2025
Reading Time: 2 mins read
in Alerts
VexTrio TDS Uses Adtech To Spread Malware

A novel attack technique called TokenBreak can be used to bypass a large language model’s safety and content moderation guardrails. Cybersecurity researchers have discovered that this can be accomplished with just a single, subtle character change in the input text. Tokenization is a fundamental step that all large language models use to break down raw text into their atomic units. The TokenBreak attack targets this tokenization strategy to induce false negatives, leaving end targets vulnerable to various malicious attacks. This attack technique was devised by the security firm HiddenLayer, which shared its findings in a recent detailed security report.

The artificial intelligence security firm found that altering input words by adding letters in certain ways caused a text classification model. Examples of this include changing the word “instructions” to “finstructions,” or changing the word “announcement” to “aannouncement” to bypass filters. These subtle changes cause different tokenizers to split the text in different ways, while still preserving their original intended meaning. What makes the attack notable is that the manipulated text remains fully understandable to both the LLM and any human reader. This causes the model to elicit the same response as what would have been the case if the unmodified text had been used.

This specific attack has been found to be successful against many text classification models that are using BPE or WordPiece tokenization.

However, it is not effective against those that are using the Unigram tokenization strategy, which provides a key mitigation path. The TokenBreak attack technique clearly demonstrates that these various protection models can be easily bypassed by simply manipulating the input text. To defend against TokenBreak, researchers suggest using Unigram tokenizers when possible and also training models with examples of bypass tricks.

It also helps to log misclassifications and look for patterns that hint at manipulation by attackers on the platform.

This important new study comes less than a month after the security firm HiddenLayer revealed how it’s possible to exploit other tools. This finding also comes as the Straiker AI Research team found that backronyms can be used to jailbreak AI chatbots. This different technique, which is called the Yearbook Attack, has proven to be effective against various models from many different companies. These methods succeed not by overpowering the model’s various safety filters, but by subtly slipping beneath them and exploiting completion bias. This shows the evolving nature of the threats that are now targeting various different artificial intelligence and large language model systems.

Reference:

  • New TokenBreak Attack Method Bypasses AI Safety Filters With Simple Typos
Tags: Cyber AlertsCyber Alerts 2025CyberattackCybersecurityFIN6June 2025More Eggs
ADVERTISEMENT

Related Posts

New Godfather Trojan Hijacks Banking Apps

Winos 4.0 Malware Hits Taiwan Via Tax Phish

June 20, 2025
New Godfather Trojan Hijacks Banking Apps

New Godfather Trojan Hijacks Banking Apps

June 20, 2025
New Godfather Trojan Hijacks Banking Apps

New Amatera Stealer Delivered By ClearFake

June 20, 2025
Fake Invoices Deliver Sorillus RAT In Europe

Fake Minecraft Mods On GitHub Spread Malware

June 19, 2025
Russian Phishing Scam Bypasses Google 2FA

Russian Phishing Scam Bypasses Google 2FA

June 19, 2025
Fake Invoices Deliver Sorillus RAT In Europe

Fake Invoices Deliver Sorillus RAT In Europe

June 19, 2025

Latest Alerts

Winos 4.0 Malware Hits Taiwan Via Tax Phish

New Amatera Stealer Delivered By ClearFake

New Godfather Trojan Hijacks Banking Apps

Fake Minecraft Mods On GitHub Spread Malware

Fake Invoices Deliver Sorillus RAT In Europe

Russian Phishing Scam Bypasses Google 2FA

Subscribe to our newsletter

    Latest Incidents

    Massive Leak Exposes 16 Billion Credentials

    Tonga Health System Down After Ransomware

    Chinese Spies Target Satellite Giant Viasat

    German Dealer Leymann Hacked Closes Stores

    Hacker Mints $27M From Meta Pool Gets 132K

    UBS and Pictet Hit By Vendor Data Breach

    CyberMaterial Logo
    • About Us
    • Contact Us
    • Jobs
    • Legal and Privacy Policy
    • Site Map

    © 2025 | CyberMaterial | All rights reserved

    Welcome Back!

    Login to your account below

    Forgotten Password?

    Retrieve your password

    Please enter your username or email address to reset your password.

    Log In

    Add New Playlist

    No Result
    View All Result
    • Alerts
    • Incidents
    • News
    • Cyber Decoded
    • Cyber Hygiene
    • Cyber Review
    • Definitions
    • Malware
    • Cyber Tips
    • Tutorials
    • Advanced Persistent Threats
    • Threat Actors
    • Report an incident
    • Password Generator
    • About Us
    • Contact Us
    • Advertise with us

    Copyright © 2025 CyberMaterial