05 · The 6-Layer Stack

From agent to silicon. Six layers, six decisions.

Every layer has a distinct role, a distinct cost profile, and a distinct decision for any organisation adopting AI. Reading top-down: where the user interacts. Bottom-up: where the spend lives.

The stack

Six layers, top to bottom.

1

Agent

Autonomous reasoning, tool use, planning loops (ReAct). Sits on top of everything else and orchestrates work.

2

Orchestration

Memory, RAG, prompt chaining, vector retrieval. Connects the model to your private data without retraining it.

3

Inference Engine

Tokenization, API gateway, sampling strategies. Every token costs money and latency.

4

Transformer Model

Attention heads, embeddings, decoder stack. The 175B to 1T parameters that ARE the compressed knowledge.

5

Training / ML Core

Pre-training, supervised fine-tuning, RLHF, Constitutional AI. Where the model gets its values.

6

Infrastructure

GPU clusters (NVIDIA H100), HBM3 memory, NVLink, InfiniBand. Do not build, buy. Cloud-first.

Where the levers are

One value lever per layer.

Layer	Business insight	Value lever
Agent	Automate multi-step knowledge work	Process cost
Orchestration	RAG over private data, no retraining needed	Data moat
Inference	Every token costs money. Caching and prompt design control OpEx	OpEx control
Transformer	Capability is largely fixed. Choose the right model	CapEx avoidance
Training	Fine-tuning at 1 to 5% of pre-training cost	Competitive edge
Infrastructure	Buy compute, do not own it	Capital discipline

The question to ask

Which layer is our spend actually on?

Most organisations think they are buying AI. They are buying inference (per-token costs) and orchestration (RAG infrastructure). Knowing which layer carries the cost makes budget conversations honest.

Want the boardroom version of this?

Back to AI Advisory Talk to Jo