How LLMs work

---

source_url:

https://www.0xkato.xyz/how-llms-actually-work/

source_name:

www.0xkato.xyz

published:

2026-06-08

status:

published

---

Tokens aren't usually whole words. They're usually subword pieces... Subword tokenization sits in the middle. The most common pieces become single tokens, and rare or novel words get composed from smaller pieces.

solid pedagogical walk through transformer internals. covers the stack systematically—tokenization through generation—without hand-waving the math away entirely. the strawberry-R’s example is good; tokenization choices have real downstream effects that surface in unexpected places. useful reference for reading model cards and papers. doesn’t break much new ground but the framing around architecture vs. trained weights as the split between “broadly shared” and “different” is pragmatic. best as a refresher or intro for someone building on transformers rather than research.