Character Encoding Converter Tool - Japanese Encoding Conversion and Mojibake Repair

First Published:
Last Updated:

Convert between Shift_JIS, EUC-JP, UTF-8, UTF-16 LE/BE, ISO-2022-JP, and Windows-31J (CP932) with auto-detection, hex-dump preview, and one-click mojibake (garbled-text) repair. All processing is performed entirely within your browser - your text and files never leave your device.

⚠️ IMPORTANT DISCLAIMER:

  • This tool is provided "AS IS" without any warranties of any kind.
  • The author accepts no responsibility for data loss, garbled output, or mis-conversion.
  • Conversion is best-effort: characters that cannot be represented in the target encoding may be replaced with substitute characters or the U+FFFD replacement marker.
  • Mojibake repair is lossless only when the displayed encoding round-trips bytes (e.g. Latin-1, Shift_JIS, EUC-JP). Sequences corrupted by UTF-8 replacement (U+FFFD) cannot be fully recovered.
  • Always keep backups of your original files before converting.
  • By using this tool, you accept full responsibility for any outcomes.

Privacy: This tool operates entirely within your browser using client-side JavaScript. No data is transmitted to servers, no files are uploaded online, and all processing happens locally on your device. Once loaded, this tool continues to work even without an internet connection. For more details, please refer to our Web Tools Disclaimer.

📁

Drop file here or click to browse

Maximum file size: 5 MB. Bytes are read as binary (ArrayBuffer).

Input size: 0 bytes
Output size: 0 bytes

Hex Dump (first 256 bytes)

(no data)

Mojibake Repair

Paste the garbled (mojibake) text into the Text tab, then click the pattern that matches your situation. Each button re-encodes the visible string under the "displayed" encoding to recover the original byte stream, then decodes it under the "actual" encoding.

Examples

Load typical scenarios to see the tool in action.


Features

  • ⚙️ Multiple Japanese Encodings: Convert among Shift_JIS, EUC-JP, UTF-8, UTF-16 LE/BE, ISO-2022-JP, and Windows-31J (CP932).
  • 🔍 Auto-Detection: BOM check + ISO-2022-JP escape-sequence scan + statistical heuristics via encoding.js.
  • 📝 Text and Binary Input: Paste text or drag & drop a binary file (up to 5 MB).
  • 🧰 Hex-Dump View: Inspect the first 256 bytes of input or output side-by-side with the decoded preview.
  • 🧰 Mojibake Repair: One-click presets for the most common Japanese mis-decoding chains.
  • ⬇️ Binary Download: Save the converted bytes as a binary file. Optional BOM for UTF-8 / UTF-16.
  • 🔒 Privacy First: Pure client-side JavaScript - no upload, no fetch, no telemetry.
  • 📱 Mobile Friendly: Responsive layout and 44px touch targets.

How to Use

  1. Choose your input source: the Text tab (encodes the textarea contents under the input encoding) or the File tab (drop a binary file).
  2. Pick the input encoding, or click Auto-Detect Input to let the tool guess it from BOM / escape sequences / byte statistics.
  3. Pick the output encoding. Optionally enable Prepend BOM for UTF-8 or UTF-16 outputs.
  4. Click Convert. The decoded preview, byte counts, and hex dump update.
  5. Download the converted bytes as a binary file, or Copy Output Text for the decoded string.
  6. Mojibake repair: paste the garbled text into the Text tab and click the matching shown / actual preset.

Important Notes

  • Lossy conversions: Characters absent from the target encoding (e.g. emoji into Shift_JIS) are replaced with substitute characters. Verify critical text manually.
  • UTF-8 replacement is permanent: Once invalid bytes have been turned into U+FFFD by a UTF-8 decoder, the original bytes are gone. Repair only works on byte-preserving display chains.
  • Auto-detection is heuristic: Short or ASCII-only inputs may be ambiguous - prefer manual selection when you know the source.
  • Windows-31J (CP932) is best-effort: The bundled encoding library treats Windows-31J as a Shift_JIS variant. CP932-specific extended characters - the NEC special characters (e.g. roman-numeral glyphs, Japanese-era abbreviations) and IBM extended characters in rows 89-92 - may be approximated by the Shift_JIS table rather than the full CP932 map. For lossless round-trip of those code points, use a dedicated CP932 conversion tool.
  • Surrogate pairs and PUA: Encoding.js treats input as UCS-2 code units; characters above U+FFFF are converted as surrogate pairs and may not round-trip cleanly through legacy CJK encodings.
  • BOMs: Some downstream tools (older Windows editors, certain compilers) handle BOMs differently. Toggle the BOM checkbox to match your downstream consumer.
  • File size: Limited to 5 MB to keep in-browser processing snappy. For larger files, split them locally first.

Common Use Cases

  • Legacy data migration: Move CSV / TXT files from Shift_JIS or EUC-JP into UTF-8 for modern pipelines.
  • Email and log forensics: Repair mojibake in archived email subjects, CSV exports, or vendor-supplied logs.
  • Encoding inspection: Quickly view the raw bytes of any text snippet to diagnose subtle encoding differences.
  • Cross-platform sharing: Add or strip a BOM when handing files between Windows, macOS, and Unix environments.

Third-Party Libraries

UTF-8 / UTF-16 LE / UTF-16 BE conversion uses the browser's built-in TextEncoder and TextDecoder (WHATWG Encoding Standard).


References:
Tech Blog with curated related content
Web Tools Collection

Written by Hidekazu Konishi