Unicode Character Lookup

Get Unicode code points, hex values, and character details for any text.

About This Tool

Unicode is a character encoding standard covering 154,998 characters across 161 scripts (as of Unicode 16.0). Each character has a code point (e.g., U+1F600), an official name ('GRINNING FACE'), a category (letter, symbol, punctuation), and metadata (block, script, version introduced).

Lookup accepts a character, code point, or partial name and returns the full record: code point, name, hex/decimal/HTML entity representations, UTF-8 byte sequence, and the block it belongs to. Useful when constructing strings programmatically or troubleshooting display issues.

The Unicode Character Database (UCD) is the authoritative source for character properties. Each character has a long list of properties: General Category (Lu = uppercase letter, Po = other punctuation, Sm = math symbol, etc.), Bidirectional Category, East Asian Width, Script, Block, Numeric Value (for digits), Decomposition (how composed characters break apart), Case Mappings (uppercase/lowercase/title equivalents), and more. The lookup tool surfaces the most useful subset. UTF-8 encoding for any code point follows fixed rules: ASCII (U+0000 to U+007F) uses 1 byte; U+0080 to U+07FF uses 2 bytes; U+0800 to U+FFFF uses 3 bytes; U+10000 to U+10FFFF uses 4 bytes. UTF-16 uses one or two 16-bit code units, with the supplementary plane encoded as surrogate pairs.

A worked example. Lookup the character ñ. Code point: U+00F1. Name: LATIN SMALL LETTER N WITH TILDE. UTF-8 bytes: 0xC3 0xB1. UTF-16: 0x00F1 (single code unit). HTML entity: ñ or ñ or ñ. Block: Latin-1 Supplement. Script: Latin. Category: Lowercase Letter (Ll). Decomposition: U+006E (n) + U+0303 (combining tilde). The decomposition tells you that 'ñ' can also be expressed as 'n' followed by a combining tilde, and Unicode treats both as canonically equivalent under NFC normalization. Compare with the emoji 👋 (U+1F44B). UTF-8: 0xF0 0x9F 0x91 0x8B (4 bytes). UTF-16: 0xD83D 0xDC4B (surrogate pair). Block: Emoticons. The 4-byte UTF-8 length is why some older systems with 3-byte UTF-8 limits (early MySQL utf8 charset) couldn't store emoji until utf8mb4 became standard.

Limitations to flag. Display correctness depends on font coverage. Tofu (the empty box) means the rendering font lacks a glyph for that code point even though Unicode defines it. Solutions: install a font that covers the script (Noto Sans family covers most Unicode), use a font stack with fallbacks, or check whether the character is in a Private Use Area (U+E000 to U+F8FF and the supplementary private use planes) where different fonts assign completely different glyphs. Older systems may not render newer characters at all — common emoji typically take a year or two to propagate to all major platforms after a Unicode release. Code point versioning matters: Unicode 16.0 (2024) added about 5,000 new characters, mostly emoji and historical scripts; older browsers and OSes won't show them.

The about text and FAQ on this page were drafted with AI assistance and reviewed by a member of the Coherence Daddy team before publishing. See our Content Policy for editorial standards.

Frequently Asked Questions