Curating, cleaning, deduplicating, and shaping training data; the 'data engineering for LLMs' role that sits upstream of every other stage. Primary AI lifecycle stage: data.
11 active AI roles across 10 companies in our index reference Data pipeline as of today. New postings fell 20% in the last 30 days versus the prior 30 (5 → 4).
The companies with the most active Data pipeline listings are: ByteDance (2 roles), BCG (1 role), Figure AI (1 role), JPMorgan Chase (1 role), PitchBook (1 role).
Data pipeline primarily belongs to the data stage of the AI lifecycle. In current hiring, Data pipeline roles concentrate at: data (64%), agents (18%).
The sectors with the most active Data pipeline hiring are: Big Tech, Vertical AI, Robotics.
Curating, cleaning, deduplicating, and shaping training data; the 'data engineering for LLMs' role that sits upstream of every other stage.
Primary AI lifecycle stage: data.
As of today, 11 active AI roles across 10 companies in our index reference Data pipeline. Hiring concentrates at the data (64%) and agents (18%) stages. Most common sectors: Big Tech, Vertical AI, Robotics. New postings fell 20% in the last 30 days versus the prior 30 (5 → 4).
3 AI roles tagged data_pipeline.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Roblox | Principal Machine Learning Engineer, Communication Safety | Consumer | 8 | LLM observability · Model serving · Inference infra · Multimodal |
| Spotify | Senior Staff Machine Learning Engineer, Content Platform | Consumer | 7 | Model serving |
| Roblox | Principal Software Engineer - Creator Success | Consumer | 5 |