ASCII, Unicode, and UTF-8 — a Practical Guide
Text looks simple until you ship it. Then “é” becomes “é”, emoji break your logs, and databases refuse to sort correctly. This article gives you a solid mental model of how text becomes bytes , why ASCII still matters, how Unicode fixes the global text problem, and why UTF-8 is the default encoding you should reach for. 1) Characters, code points, bytes Character : the abstract “letter/symbol” humans see (e.g., A , é , 🙂 ). Code point : a number assigned to a character. In Unicode, A is U+0041, é is U+00E9, 🙂 is U+1F642. Encoding : a method that turns code points into bytes (binary) and back. Computers store and transmit bytes , not characters. Encodings are the agreement for mapping between the two. 2) ASCII: the OG mapping (7-bit) ASCII defines 128 code points (0–127). It fits in 7 bits, commonly stored as a full byte (the top bit is 0). That’s ...