Binary — a Practical Guide
A practical guide to bits, bytes, numbers, endianness, floating point, binary I/O, checksums, and common pitfalls—with copy‑paste toolkits for Python, Go, and JavaScript at the end.
1) Bits, bytes, bases
- bit: 0 or 1
- byte: 8 bits (most systems are byte-addressable)
- bases: binary
0b1010
, hex0xA
, decimal10
- nibble: 4 bits → maps cleanly to 1 hex digit
Binary ↔ hex head math (nibbles):
bin 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
hex 0 1 2 3 4 5 6 7 8 9 A B C D E F
KB vs KiB
- KB/MB/GB (SI): 10³, 10⁶, 10⁹
- KiB/MiB/GiB (binary): 2¹⁰, 2²⁰, 2³⁰
2) Integers: unsigned, signed, two’s complement
- unsigned n‑bit range:
0 … (2ⁿ−1)
(8‑bit:0..255
) - two’s complement (signed) range:
−2ⁿ⁻¹ … 2ⁿ⁻¹−1
(8‑bit:−128..127
) - Negation in two’s complement: invert bits + 1
Bitwise operations & masks
- AND
&
, OR|
, XOR^
, NOT~
, shifts<< >>
- Common patterns:
- test bit k:
(x >> k) & 1
- set bit k:
x | (1 << k)
- clear bit k:
x & ~(1 << k)
- toggle bit k:
x ^ (1 << k)
- test bit k:
Overflow: unsigned math wraps modulo 2ⁿ; signed overflow is undefined behavior in C/C++.
3) Endianness (byte order)
- little‑endian: least significant byte first (x86, ARM64 default)
- big‑endian: most significant byte first (network protocols = “network byte order”)
Python demo:
import struct
n = 0x12345678
struct.pack("<I", n) # little: b'xV4\x12'
struct.pack(">I", n) # big: b'\x12\x34\x56\x78'
Mismatch ⇒ corrupted numbers across binaries/sockets if not handled explicitly.
4) Floating point (IEEE‑754 quick view)
- float32: 1 sign, 8 exponent, 23 fraction
- float64: 1 sign, 11 exponent, 52 fraction
- Special values:
±0
,±∞
,NaN
(many bit patterns) - Avoid exact equality; use an epsilon/tolerance.
a = 0.1 + 0.2
abs(a - 0.3) < 1e-9 # True; don't use a == 0.3
5) Text and bytes (super short)
- ASCII = first 128 Unicode code points → single byte
0xxxxxxx
- UTF‑8 = variable (1–4 bytes), ASCII‑compatible; default everywhere
- Always declare encodings at boundaries (files/DB/HTTP). Normalize for comparisons when needed.
6) Packing data structures (files, sockets, IPC)
Binary protocols define: field order, sizes, endianness, optional compression/checksums.
# header: magic(4s), version(u8), flags(u8), length(u16 LE)
import struct
magic = b"LOG1"
payload = b"..."
flags = 0b0000_0011
header = struct.pack("<4sBBH", magic, 1, flags, len(payload))
packet = header + payload
C/ABI caution: compilers may insert padding/alignment. Prefer explicit serialization over packed structs for cross‑lang interoperability.
7) Bit‑packing and fields
Space‑efficient formats pack multiple logical fields into few bytes.
Example layout:
byte0: [priority:3][type:3][error:1][reserved:1]
Encode/decode:
priority, typ, error = 5, 3, 1 # each within range
b0 = (priority & 0x7) << 5 | (typ & 0x7) << 2 | (error & 0x1) << 1
decoded_priority = (b0 >> 5) & 0x7
decoded_type = (b0 >> 2) & 0x7
decoded_error = (b0 >> 1) & 0x1
8) Checksums & hashes (integrity)
- parity (1 bit), CRC (fast error detection), Adler‑32
- MD5/SHA‑1/SHA‑256 are cryptographic hashes (use for tamper evidence/content IDs; CRC is faster for wire integrity)
- Choose CRC for packets; cryptographic hashes for authenticity.
9) Binary I/O toolkit (CLI)
xxd file.bin | head
— hex dumphexdump -C file.bin
— canonical hex + ASCIIod -An -t x1 file.bin
— raw byte hexiconv
,file
— encoding diagnostics
10) Common pitfalls checklist
11) Practical exercises
- Endian swapper: reverse the byte order of a 32‑bit integer.
- Flags: model feature flags in a single byte; implement
enable
,disable
,isEnabled
. - Parser: define a 12‑byte header (
magic[4], version u8, flags u8, length u32 LE, crc u16
) and parse/emit it. - Float probe: print hex bytes of a
float32
for1.0
,0.5
,NaN
,∞
and explain exponent/fraction.
Copy‑Paste Language Toolkits
Below are minimal, dependency‑light snippets you can drop into projects/tests.
Python Toolkit (binary_utils.py
)
from __future__ import annotations
import struct
from typing import Iterable
# — Bits & masks —
def test_bit(x: int, k: int) -> int: return (x >> k) & 1
def set_bit(x: int, k: int) -> int: return x | (1 << k)
def clr_bit(x: int, k: int) -> int: return x & ~(1 << k)
def tog_bit(x: int, k: int) -> int: return x ^ (1 << k)
def lowbits(n: int) -> int: return (1 << n) - 1
# — Endianness —
def to_bytes_be(n: int, width: int) -> bytes: return n.to_bytes(width, "big", signed=False)
def to_bytes_le(n: int, width: int) -> bytes: return n.to_bytes(width, "little", signed=False)
def from_bytes_be(b: bytes) -> int: return int.from_bytes(b, "big", signed=False)
def from_bytes_le(b: bytes) -> int: return int.from_bytes(b, "little", signed=False)
# — Struct helpers —
def pack_header(magic: bytes, version: int, flags: int, length: int) -> bytes:
return struct.pack("<4sBBH", magic, version, flags, length)
def unpack_header(b: bytes):
magic, version, flags, length = struct.unpack("<4sBBH", b[:8])
return {"magic": magic, "version": version, "flags": flags, "length": length}
# — Hex/Binary dumps —
def hex_dump(b: bytes, width: int = 16) -> str:
lines = []
for i in range(0, len(b), width):
chunk = b[i:i+width]
hexpart = " ".join(f"{x:02x}" for x in chunk)
ascpart = "".join(chr(x) if 32 <= x < 127 else "." for x in chunk)
lines.append(f"{i:08x} {hexpart:<{width*3}} |{ascpart}|")
return "\n".join(lines)
def bits(b: bytes) -> str: return " ".join(f"{x:08b}" for x in b)
# — Simple CRC‑16/CCITT‑False —
def crc16(data: bytes, poly: int = 0x1021, init: int = 0xFFFF) -> int:
crc = init
for byte in data:
crc ^= byte << 8
for _ in range(8):
crc = ((crc << 1) ^ poly) & 0xFFFF if (crc & 0x8000) else (crc << 1) & 0xFFFF
return crc
if __name__ == "__main__":
pkt = pack_header(b'LOG1', 1, 0b11, 5) + b"hello"
print(hex_dump(pkt))
print("CRC16:", hex(crc16(pkt)))
Go Toolkit (binaryutils/binaryutils.go
)
package binaryutils
import (
"encoding/binary"
"fmt"
)
// Bits & masks
func TestBit(x uint64, k uint) uint64 { return (x >> k) & 1 }
func SetBit(x uint64, k uint) uint64 { return x | (1 << k) }
func ClrBit(x uint64, k uint) uint64 { return x & ^(1 << k) }
func TogBit(x uint64, k uint) uint64 { return x ^ (1 << k) }
func LowBits(n uint) uint64 { return (1 << n) - 1 }
// Endianness
func ToBytesBE(n uint64, width int) []byte {
b := make([]byte, width)
for i := width - 1; i >= 0; i-- { b[i] = byte(n); n >>= 8 }
return b
}
func ToBytesLE(n uint64, width int) []byte {
b := make([]byte, width)
for i := 0; i < width; i++ { b[i] = byte(n); n >>= 8 }
return b
}
func FromBytesBE(b []byte) uint64 {
var n uint64
for i := 0; i < len(b); i++ { n = (n << 8) | uint64(b[i]) }
return n
}
func FromBytesLE(b []byte) uint64 {
var n uint64
for i := len(b) - 1; i >= 0; i-- { n = (n << 8) | uint64(b[i]) }
return n
}
// Pack/Unpack example header
func PackHeader(magic []byte, version, flags uint8, length uint16) []byte {
b := make([]byte, 0, 8)
b = append(b, magic[:4]...)
b = append(b, version, flags)
tmp := make([]byte, 2)
binary.LittleEndian.PutUint16(tmp, length)
b = append(b, tmp...)
return b
}
func HexDump(b []byte, width int) string {
if width <= 0 { width = 16 }
out := ""
for i := 0; i < len(b); i += width {
end := i + width
if end > len(b) { end = len(b) }
chunk := b[i:end]
hexpart := ""
ascpart := ""
for _, v := range chunk {
hexpart += fmt.Sprintf("%02x ", v)
if v >= 32 && v < 127 { ascpart += string(v) } else { ascpart += "." }
}
out += fmt.Sprintf("%08x %-*s |%s|\n", i, width*3, hexpart, ascpart)
}
return out
}
// CRC-16/CCITT-False
func CRC16(data []byte) uint16 {
var crc uint16 = 0xFFFF
for _, b := range data {
crc ^= uint16(b) << 8
for i := 0; i < 8; i++ {
if (crc & 0x8000) != 0 { crc = (crc << 1) ^ 0x1021 } else { crc <<= 1 }
}
}
return crc
}
JavaScript Toolkit (binary-utils.js
)
// Bits & masks
export const testBit = (x, k) => (x >>> k) & 1;
export const setBit = (x, k) => (x | (1 << k)) >>> 0;
export const clrBit = (x, k) => (x & ~(1 << k)) >>> 0;
export const togBit = (x, k) => (x ^ (1 << k)) >>> 0;
export const lowBits = n => (1 << n) - 1;
// Endianness/byte views
export const toBytesBE = (n, width) => {
const b = new Uint8Array(width);
for (let i = width - 1; i >= 0; i--) { b[i] = n & 0xFF; n >>>= 8; }
return b;
};
export const toBytesLE = (n, width) => {
const b = new Uint8Array(width);
for (let i = 0; i < width; i++) { b[i] = n & 0xFF; n >>>= 8; }
return b;
};
export const fromBytesBE = (bytes) => bytes.reduce((n, v) => (n * 256 + v) >>> 0, 0);
export const fromBytesLE = (bytes) => bytes.reduceRight((n, v) => (n * 256 + v) >>> 0, 0);
// Hex/Binary dumps
export const hexDump = (buf, width = 16) => {
const b = buf instanceof Uint8Array ? buf : new Uint8Array(buf);
let out = "";
for (let i = 0; i < b.length; i += width) {
const chunk = b.slice(i, i + width);
const hex = Array.from(chunk).map(x => x.toString(16).padStart(2, "0")).join(" ");
const ascii = Array.from(chunk).map(x => (x >= 32 && x < 127) ? String.fromCharCode(x) : ".").join("");
out += `${i.toString(16).padStart(8, "0")} ${hex.padEnd(width * 3, " ")} |${ascii}|\n`;
}
return out;
};
// CRC-16/CCITT-False
export const crc16 = (buf) => {
const b = buf instanceof Uint8Array ? buf : new Uint8Array(buf);
let crc = 0xFFFF;
for (const byte of b) {
crc ^= (byte << 8);
for (let i = 0; i < 8; i++) {
crc = (crc & 0x8000) ? ((crc << 1) ^ 0x1021) : (crc << 1);
crc &= 0xFFFF;
}
}
return crc;
};
How to use the toolkits quickly
Python
import binary_utils as bu pkt = bu.pack_header(b'LOG1', 1, 0b11, 5) + b"hello" print(bu.hex_dump(pkt)); print(hex(bu.crc16(pkt)))
Go
pkt := binaryutils.PackHeader([]byte("LOG1"), 1, 3, 5) fmt.Print(binaryutils.HexDump(append(pkt, []byte("hello")...), 16)) fmt.Printf("CRC16: 0x%04x\n", binaryutils.CRC16(pkt))
JavaScript (Node/Browser)
import { hexDump, crc16, toBytesLE } from "./binary-utils.js"; const pkt = new Uint8Array([..."LOG1"].map(c => c.charCodeAt(0)).concat([1, 3, 5, 0])); console.log(hexDump(pkt)); console.log("CRC16:", crc16(pkt).toString(16));
Comments
Post a Comment