Machine Learning Research Engineer, Agent Data Foundation - Enterprise Genai

Scale AI Scale AI · Data AI · San Francisco, CA · Enterprise Engineering

This role focuses on researching and building synthetic data pipelines and agents to improve enterprise GenAI models. It involves creating agents for trace analysis, contributing to an agent-building framework, and training state-of-the-art models using post-training and agent-building algorithms.

What you'd actually do

  1. Build synthetic data pipelines to generate enterprise environments to use for RL post-training
  2. Create agents to convert traces from production into actionable insights to use to improve agents
  3. Contribute to our agent building product which can construct other agents using coding agents + proprietary algorithms
  4. Train state of the art models, developed both internally and from the community, to deploy to our enterprise customers.

Skills

Required

  • LLMs
  • synthetic data generation
  • agent development
  • post-training algorithms
  • production environments
  • data pipelines

Nice to have

  • Publications in top conferences (NEURIPS, ICLR, ICML)
  • PhD or Masters in Computer Science or related field

What the JD emphasized

  • Publications in top conferences such as NEURIPS, ICLR, or ICML within the last two years
  • 3+ years of building with LLMs in a production environment
  • Clear experiences with constructing high quality data to use to improve an LLM/Agent

Other signals

  • research around synthetic environments
  • building agents for trace analysis
  • contributing to a cutting edge framework that automatically hill-climbs agent-building from an eval set
  • creating best-in-class Agents that achieve state of the art results through a combination of post-training + agent-building algorithms