Binary — a Practical Guide

 


A practical guide to bits, bytes, numbers, endianness, floating point, binary I/O, checksums, and common pitfalls—with copy‑paste toolkits for PythonGo, and JavaScript at the end.


1) Bits, bytes, bases

  • bit: 0 or 1
  • byte: 8 bits (most systems are byte-addressable)
  • bases: binary 0b1010, hex 0xA, decimal 10
  • nibble: 4 bits → maps cleanly to 1 hex digit

Binary ↔ hex head math (nibbles):

bin  0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
hex     0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F

KB vs KiB

  • KB/MB/GB (SI): 10³, 10⁶, 10⁹
  • KiB/MiB/GiB (binary): 2¹⁰, 2²⁰, 2³⁰

2) Integers: unsigned, signed, two’s complement

  • unsigned n‑bit range: 0 … (2ⁿ−1) (8‑bit: 0..255)
  • two’s complement (signed) range: −2ⁿ⁻¹ … 2ⁿ⁻¹−1 (8‑bit: −128..127)
  • Negation in two’s complement: invert bits + 1

Bitwise operations & masks

  • AND &, OR |, XOR ^, NOT ~, shifts << >>
  • Common patterns:
    • test bit k(x >> k) & 1
    • set bit kx | (1 << k)
    • clear bit kx & ~(1 << k)
    • toggle bit kx ^ (1 << k)

Overflow: unsigned math wraps modulo 2ⁿ; signed overflow is undefined behavior in C/C++.


3) Endianness (byte order)

  • little‑endian: least significant byte first (x86, ARM64 default)
  • big‑endian: most significant byte first (network protocols = “network byte order”)

Python demo:

import struct
n = 0x12345678
struct.pack("<I", n)  # little: b'xV4\x12'
struct.pack(">I", n)  # big:    b'\x12\x34\x56\x78'

Mismatch ⇒ corrupted numbers across binaries/sockets if not handled explicitly.


4) Floating point (IEEE‑754 quick view)

  • float32: 1 sign, 8 exponent, 23 fraction
  • float64: 1 sign, 11 exponent, 52 fraction
  • Special values: ±0±∞NaN (many bit patterns)
  • Avoid exact equality; use an epsilon/tolerance.
a = 0.1 + 0.2
abs(a - 0.3) < 1e-9  # True; don't use a == 0.3

5) Text and bytes (super short)

  • ASCII = first 128 Unicode code points → single byte 0xxxxxxx
  • UTF‑8 = variable (1–4 bytes), ASCII‑compatible; default everywhere
  • Always declare encodings at boundaries (files/DB/HTTP). Normalize for comparisons when needed.

6) Packing data structures (files, sockets, IPC)

Binary protocols define: field order, sizes, endianness, optional compression/checksums.

# header: magic(4s), version(u8), flags(u8), length(u16 LE)
import struct
magic = b"LOG1"
payload = b"..."
flags = 0b0000_0011
header = struct.pack("<4sBBH", magic, 1, flags, len(payload))
packet = header + payload

C/ABI caution: compilers may insert padding/alignment. Prefer explicit serialization over packed structs for cross‑lang interoperability.


7) Bit‑packing and fields

Space‑efficient formats pack multiple logical fields into few bytes.

Example layout:

byte0: [priority:3][type:3][error:1][reserved:1]

Encode/decode:

priority, typ, error = 5, 3, 1  # each within range
b0 = (priority & 0x7) << 5 | (typ & 0x7) << 2 | (error & 0x1) << 1
decoded_priority = (b0 >> 5) & 0x7
decoded_type     = (b0 >> 2) & 0x7
decoded_error    = (b0 >> 1) & 0x1

8) Checksums & hashes (integrity)

  • parity (1 bit), CRC (fast error detection), Adler‑32
  • MD5/SHA‑1/SHA‑256 are cryptographic hashes (use for tamper evidence/content IDs; CRC is faster for wire integrity)
  • Choose CRC for packets; cryptographic hashes for authenticity.

9) Binary I/O toolkit (CLI)

  • xxd file.bin | head — hex dump
  • hexdump -C file.bin — canonical hex + ASCII
  • od -An -t x1 file.bin — raw byte hex
  • iconvfile — encoding diagnostics

10) Common pitfalls checklist


11) Practical exercises

  1. Endian swapper: reverse the byte order of a 32‑bit integer.
  2. Flags: model feature flags in a single byte; implement enabledisableisEnabled.
  3. Parser: define a 12‑byte header (magic[4], version u8, flags u8, length u32 LE, crc u16) and parse/emit it.
  4. Float probe: print hex bytes of a float32 for 1.00.5NaN and explain exponent/fraction.

Copy‑Paste Language Toolkits

Below are minimal, dependency‑light snippets you can drop into projects/tests.

Python Toolkit (binary_utils.py)

from __future__ import annotations
import struct
from typing import Iterable

# — Bits & masks —
def test_bit(x: int, k: int) -> int: return (x >> k) & 1
def set_bit(x: int, k: int) -> int:  return x | (1 << k)
def clr_bit(x: int, k: int) -> int:  return x & ~(1 << k)
def tog_bit(x: int, k: int) -> int:  return x ^ (1 << k)
def lowbits(n: int) -> int:          return (1 << n) - 1

# — Endianness —
def to_bytes_be(n: int, width: int) -> bytes: return n.to_bytes(width, "big", signed=False)
def to_bytes_le(n: int, width: int) -> bytes: return n.to_bytes(width, "little", signed=False)
def from_bytes_be(b: bytes) -> int: return int.from_bytes(b, "big", signed=False)
def from_bytes_le(b: bytes) -> int: return int.from_bytes(b, "little", signed=False)

# — Struct helpers —
def pack_header(magic: bytes, version: int, flags: int, length: int) -> bytes:
    return struct.pack("<4sBBH", magic, version, flags, length)

def unpack_header(b: bytes):
    magic, version, flags, length = struct.unpack("<4sBBH", b[:8])
    return {"magic": magic, "version": version, "flags": flags, "length": length}

# — Hex/Binary dumps —
def hex_dump(b: bytes, width: int = 16) -> str:
    lines = []
    for i in range(0, len(b), width):
        chunk = b[i:i+width]
        hexpart = " ".join(f"{x:02x}" for x in chunk)
        ascpart = "".join(chr(x) if 32 <= x < 127 else "." for x in chunk)
        lines.append(f"{i:08x}  {hexpart:<{width*3}}  |{ascpart}|")
    return "\n".join(lines)

def bits(b: bytes) -> str: return " ".join(f"{x:08b}" for x in b)

# — Simple CRC‑16/CCITT‑False —
def crc16(data: bytes, poly: int = 0x1021, init: int = 0xFFFF) -> int:
    crc = init
    for byte in data:
        crc ^= byte << 8
        for _ in range(8):
            crc = ((crc << 1) ^ poly) & 0xFFFF if (crc & 0x8000) else (crc << 1) & 0xFFFF
    return crc

if __name__ == "__main__":
    pkt = pack_header(b'LOG1', 1, 0b11, 5) + b"hello"
    print(hex_dump(pkt))
    print("CRC16:", hex(crc16(pkt)))

Go Toolkit (binaryutils/binaryutils.go)

package binaryutils

import (
    "encoding/binary"
    "fmt"
)

// Bits & masks
func TestBit(x uint64, k uint) uint64 { return (x >> k) & 1 }
func SetBit(x uint64, k uint) uint64  { return x | (1 << k) }
func ClrBit(x uint64, k uint) uint64  { return x & ^(1 << k) }
func TogBit(x uint64, k uint) uint64  { return x ^ (1 << k) }
func LowBits(n uint) uint64           { return (1 << n) - 1 }

// Endianness
func ToBytesBE(n uint64, width int) []byte {
    b := make([]byte, width)
    for i := width - 1; i >= 0; i-- { b[i] = byte(n); n >>= 8 }
    return b
}
func ToBytesLE(n uint64, width int) []byte {
    b := make([]byte, width)
    for i := 0; i < width; i++ { b[i] = byte(n); n >>= 8 }
    return b
}
func FromBytesBE(b []byte) uint64 {
    var n uint64
    for i := 0; i < len(b); i++ { n = (n << 8) | uint64(b[i]) }
    return n
}
func FromBytesLE(b []byte) uint64 {
    var n uint64
    for i := len(b) - 1; i >= 0; i-- { n = (n << 8) | uint64(b[i]) }
    return n
}

// Pack/Unpack example header
func PackHeader(magic []byte, version, flags uint8, length uint16) []byte {
    b := make([]byte, 0, 8)
    b = append(b, magic[:4]...)
    b = append(b, version, flags)
    tmp := make([]byte, 2)
    binary.LittleEndian.PutUint16(tmp, length)
    b = append(b, tmp...)
    return b
}

func HexDump(b []byte, width int) string {
    if width <= 0 { width = 16 }
    out := ""
    for i := 0; i < len(b); i += width {
        end := i + width
        if end > len(b) { end = len(b) }
        chunk := b[i:end]
        hexpart := ""
        ascpart := ""
        for _, v := range chunk {
            hexpart += fmt.Sprintf("%02x ", v)
            if v >= 32 && v < 127 { ascpart += string(v) } else { ascpart += "." }
        }
        out += fmt.Sprintf("%08x  %-*s |%s|\n", i, width*3, hexpart, ascpart)
    }
    return out
}

// CRC-16/CCITT-False
func CRC16(data []byte) uint16 {
    var crc uint16 = 0xFFFF
    for _, b := range data {
        crc ^= uint16(b) << 8
        for i := 0; i < 8; i++ {
            if (crc & 0x8000) != 0 { crc = (crc << 1) ^ 0x1021 } else { crc <<= 1 }
        }
    }
    return crc
}

JavaScript Toolkit (binary-utils.js)

// Bits & masks
export const testBit = (x, k) => (x >>> k) & 1;
export const setBit  = (x, k) => (x | (1 << k)) >>> 0;
export const clrBit  = (x, k) => (x & ~(1 << k)) >>> 0;
export const togBit  = (x, k) => (x ^ (1 << k)) >>> 0;
export const lowBits = n => (1 << n) - 1;

// Endianness/byte views
export const toBytesBE = (n, width) => {
  const b = new Uint8Array(width);
  for (let i = width - 1; i >= 0; i--) { b[i] = n & 0xFF; n >>>= 8; }
  return b;
};
export const toBytesLE = (n, width) => {
  const b = new Uint8Array(width);
  for (let i = 0; i < width; i++) { b[i] = n & 0xFF; n >>>= 8; }
  return b;
};
export const fromBytesBE = (bytes) => bytes.reduce((n, v) => (n * 256 + v) >>> 0, 0);
export const fromBytesLE = (bytes) => bytes.reduceRight((n, v) => (n * 256 + v) >>> 0, 0);

// Hex/Binary dumps
export const hexDump = (buf, width = 16) => {
  const b = buf instanceof Uint8Array ? buf : new Uint8Array(buf);
  let out = "";
  for (let i = 0; i < b.length; i += width) {
    const chunk = b.slice(i, i + width);
    const hex = Array.from(chunk).map(x => x.toString(16).padStart(2, "0")).join(" ");
    const ascii = Array.from(chunk).map(x => (x >= 32 && x < 127) ? String.fromCharCode(x) : ".").join("");
    out += `${i.toString(16).padStart(8, "0")}  ${hex.padEnd(width * 3, " ")} |${ascii}|\n`;
  }
  return out;
};

// CRC-16/CCITT-False
export const crc16 = (buf) => {
  const b = buf instanceof Uint8Array ? buf : new Uint8Array(buf);
  let crc = 0xFFFF;
  for (const byte of b) {
    crc ^= (byte << 8);
    for (let i = 0; i < 8; i++) {
      crc = (crc & 0x8000) ? ((crc << 1) ^ 0x1021) : (crc << 1);
      crc &= 0xFFFF;
    }
  }
  return crc;
};

How to use the toolkits quickly

  • Python

    import binary_utils as bu
    pkt = bu.pack_header(b'LOG1', 1, 0b11, 5) + b"hello"
    print(bu.hex_dump(pkt)); print(hex(bu.crc16(pkt)))
    
  • Go

    pkt := binaryutils.PackHeader([]byte("LOG1"), 1, 3, 5)
    fmt.Print(binaryutils.HexDump(append(pkt, []byte("hello")...), 16))
    fmt.Printf("CRC16: 0x%04x\n", binaryutils.CRC16(pkt))
    
  • JavaScript (Node/Browser)

    import { hexDump, crc16, toBytesLE } from "./binary-utils.js";
    const pkt = new Uint8Array([..."LOG1"].map(c => c.charCodeAt(0)).concat([1, 3, 5, 0]));
    console.log(hexDump(pkt));
    console.log("CRC16:", crc16(pkt).toString(16));
    


Comments

Popular posts from this blog

Mount StorageBox to the server for backup

psql: error: connection to server at "localhost" (127.0.0.1), port 5433 failed: ERROR: failed to authenticate with backend using SCRAM DETAIL: valid password not found

Keeping Your SSH Connections Alive: Configuring ServerAliveInterval and ServerAliveCountMax