Encoding evasion wraps prompt injection payloads in encodings that bypass text-based safety filters while remaining interpretable by the language model. This technique exploits the gap between what safety scanners check and what models understand.
Base64 encoding is the most widespread evasion method. Attackers encode their payload in Base64 and ask the model to decode it. Most regex-based filters do not inspect Base64 content, but many language models can decode Base64 reliably.
ROT13 and Caesar ciphers shift characters by a fixed amount. The model is instructed to "decode" the shifted text, which it often can.
Unicode substitution replaces ASCII characters with visually similar Unicode characters. The letter "a" might be replaced with a Cyrillic "a" (U+0430). This bypasses exact string matching while remaining readable to the model.
Hex encoding represents characters as their hexadecimal values. Some models can interpret hex sequences when prompted.
Token boundary manipulation inserts zero-width characters or unusual whitespace between characters to break pattern matching while leaving the text intelligible to the model's tokenizer.
Most safety filters operate on the surface text. They look for specific strings or patterns in the raw input. Encoded content does not contain those strings. But language models process text at the token level after their own tokenization, and many can understand or decode common encodings when asked.
Decode before scanning. Normalize input by detecting and decoding common encodings before running safety checks. Check for Base64 patterns, Unicode homoglyphs, and hex sequences.
Multi-encoding scanning in Authensor's Aegis scanner checks both the raw input and decoded variants. This catches payloads regardless of their encoding wrapper.
Block decode instructions. If your use case does not require the model to decode encoded text, add a policy rule that flags requests containing encoding-related instructions.
Character set restriction limits input to expected character sets. If your application serves English-speaking users, flag input containing Cyrillic or other unexpected Unicode blocks.
Output-side enforcement remains essential. Even if an encoded payload reaches the model, Authensor's policy engine evaluates the resulting actions against explicit rules. The encoding may evade the scanner, but the action still needs authorization.
Encoding evasion is simple to execute and effective against naive filters. Defense requires normalizing input before scanning.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides