Action guide

How LLMs Tokenize Text

Billing and limits disagree with your word count. Rare names and compounds blow up. Tokenization hits cost and quality like a tax you did not see coming.

Why subscribe

Pricing says tokens; intuition says words. Until rare tokens chew budget and quality, you need the tokenizer story before you argue about context limits.

For: Engineers sizing prompts, retrieval, and budget who must explain surprises without sounding like a textbook.

A concrete mental model of subword behavior
Why rare strings explode length and hurt outputs
Better budgeting conversations with finance and PM
Examples that break naïve splitting
Rules of thumb for edge cases in prod prompts
Hooks into latency and cost reasoning
Links tokenizer quirks to failures users actually see

Subscribe free to unlock the full guide and all future updates.

Word-level tokenization examples with colored word boxes, then a hard sentence with hyphens and apostrophes, plus panels on vocabulary size and the OOV problem

What you’ll learn

What whitespace / word-level tokenization pretends to solve, where it breaks (hyphenated and possessive words, rarer surface forms), why a growing word list does not scale, and what out-of-vocabulary (OOV) means when the model was never trained on a surface string as a single word.

When you subscribe to the newsletter, you get access to the full online guide alongside course and issue updates.

Explore the other action guides

Each guide kills one sharp problem. You leave with steps you can type, not inspiration quotes.

How LLMs Tokenize Text

What you’ll learn

Explore the other action guides

AI Agent Architecture Simplified

Attention: Explained for Engineers

Bayes' Theorem Made Simple

Build a HackerNews MCP Server From Scratch

Build a Research Agent in LangChain

DocString and Review Agent in LangGraph

How MCP Works

Prompt LLMs Like a Pro by Context Activation

Setting Up AI Projects in Python

Tests That Mean Something

Understand RAG From First Principles

Write System Prompts for AI Agents Like a Pro

How LLMs Tokenize Text

Get the full guide

What you’ll learn

Explore the other action guides

AI Agent Architecture Simplified

Attention: Explained for Engineers

Bayes' Theorem Made Simple

Build a HackerNews MCP Server From Scratch

Build a Research Agent in LangChain

DocString and Review Agent in LangGraph

How MCP Works

Prompt LLMs Like a Pro by Context Activation

Setting Up AI Projects in Python

Tests That Mean Something

Understand RAG From First Principles

Write System Prompts for AI Agents Like a Pro

Unlock the library