Why a pipeline, not a layer cake
Layer-cake diagrams (a16z, Coatue, etc.) are five rectangles labelled Hardware / Compute / Foundation Models / Tooling / Apps. They’re fine for taxonomy. They miss the asymmetry we care about: the same engineer can’t flow up and down a layer cake, but they very much flow down a pipeline. Comp tilts with the flow — upstream stages are scarcer and pay more. The curve above tells that story directly.
What each station ships
We anchor every station on its ship artifact — what you actually hand to the next station. That makes classification sharp. A role that “does training infra” isn’t ambiguous: if the artifact is a base model, it’s Pretrain; if the artifact is a tuned variant, it’s Post-train; if the artifact is the GPU cluster others use, it’s Serve. The classifier prompt enforces this.
Three opinionated calls
- Pretrain and Post-train are separate stages. Most stack diagrams collapse these into “foundation models”. We don’t — pretraining teams are tiny and the salary band is dramatically different from RLHF / fine-tuning teams. Collapsing them hides the most extreme upstream-downstream gap.
- Eval Gate is its own station. When eval is someone’s entire job (not a side activity), the role lives here. A builder who also evals their own agent stays at Agent — we capture “eval as a function” through a separate role-type field, not by double-counting layers.
- Retrieval is a sub-component of Agent, not its own stage. RAG, vector DB, and semantic search show up astags on Agent roles. They’re a feature of the assembly process, not a station on the line.
What this lens reveals
Two questions get sharp answers from this model that a layer cake muddles. Who’s doing real foundation training? — count Pretrain + Post-train roles per company. Who’s wrapping someone else’s model? — high Ship roles, sparse Pretrain. The asymmetry between “owns the model” and “owns the product” is the core hiring-market signal we publish. The rest of the site lets you slice it.