May 13, 2026

Understanding LLM Inference Internals

Introduction

What makes LLM inference fast (or slow)? This post breaks down the full pipeline.

How text becomes tokens.

How tokens become vectors.

Attention, FFN, and residual connections.

Temperature, top-k, top-p — how the next token is chosen.

// Example from llama.cpp or similar

Key takeaways about inference performance.

Have thoughts or questions? Reach out on X.