Text to Binary Converter

Convert text to its binary representation (8-bit ASCII).

About This Tool

Text-to-binary conversion encodes each character as its numeric codepoint in base 2. ASCII characters take 7 bits but are typically padded to 8 (a byte). Multi-byte UTF-8 characters expand to 16, 24, or 32 bits depending on codepoint.

The converter outputs space-separated 8-bit groups by default, with options for grouping (4-bit nibbles, no spaces) and encoding (ASCII-only, full UTF-8).

The binary representation of text depends on the encoding. ASCII characters fit in 7 bits and are typically displayed as 8-bit bytes with a leading zero. UTF-8 encoding represents Unicode codepoints in 1, 2, 3, or 4 bytes depending on the codepoint value: 1 byte (0x00–0x7F, which is ASCII), 2 bytes (0x80–0x7FF, Latin-1 supplement and other European scripts), 3 bytes (0x800–0xFFFF, the Basic Multilingual Plane), and 4 bytes (0x10000–0x10FFFF, supplementary planes including emoji). The 4-byte sequences for emoji are why a single tweet with emoji can exceed character-count limits in apps that count bytes rather than codepoints. The conversion produces ones and zeros — a literal binary representation. Reading binary as text reverses the process: split into byte boundaries, parse each as base-2, decode according to the encoding.

A worked example: "Hello" in binary. ASCII codepoints: H=72, e=101, l=108, l=108, o=111. Binary (8 bits each): 01001000 01100101 01101100 01101100 01101111. The space-separated 8-bit grouping is the human-readable convention. The same string with a heart emoji "Hello ❤️" expands considerably — the heart is U+2764 (3 bytes in UTF-8: E2 9D A4), giving 01001000 01100101 01101100 01101100 01101111 11100010 10011101 10100100 plus 3 more bytes for the variation selector U+FE0F. A single emoji character in text becomes 6 bytes in binary; the apparent length and the encoded length diverge significantly.

Limitations: binary-as-text is a teaching and puzzle representation, not a wire format. Real binary protocols use raw bytes, not the ASCII characters '0' and '1'. Converting back from binary requires knowing the encoding — the converter assumes the same encoding both directions but you can't determine encoding from binary alone. MSB-first (most-significant-bit leftmost) is the human-readable convention used here; LSB-first ordering exists in some hardware contexts and produces different binary strings for the same text. The output length grows 8x compared to ASCII or roughly 24x for multi-byte UTF-8 — not suitable for transmission, only display.

The about text and FAQ on this page were drafted with AI assistance and reviewed by a member of the Coherence Daddy team before publishing. See our Content Policy for editorial standards.

Frequently Asked Questions