Senior/staff Machine Learning Engineer, General Agents, Enterprise Genai

Scale AI Scale AI · Data AI · New York, NY +2 · Enterprise Engineering

Senior/Staff Machine Learning Engineer on the General Agents team, responsible for designing, building, and deploying production-ready AI agents for enterprise use cases. This role involves working across the full agent lifecycle, from model and system design to evaluation, deployment, and iteration, bridging cutting-edge agentic techniques with real-world deployment constraints.

What you'd actually do

  1. Design and implement end-to-end agent systems that combine LLM reasoning, tool use, memory, and control logic to solve recurring enterprise use cases.
  2. Build scalable, reliable agent architectures that can be deployed across many customers with varying data, tools, and constraints.
  3. Develop evaluation frameworks, datasets, environments, and metrics to measure agent agent performance, reliability, and business impact in production settings.
  4. Collaborate closely with product managers, customers, data annotators, and other engineering teams to translate enterprise requirements into robust agent designs.
  5. Productionize frontier agent techniques (e.g., planning, multi-step reasoning and tool-use, multi-agent patterns) into maintainable, observable systems.

Skills

Required

  • Python
  • LLM optimization
  • agentic system design
  • production ML systems
  • integrating models with external tools, APIs, databases, and services

Nice to have

  • AI agents using modern generative AI stacks
  • agent frameworks
  • orchestration layers
  • workflow systems
  • evaluation, monitoring, and observability for LLM-powered systems
  • deploying ML systems in cloud environments
  • fine-tuning foundation models
  • SFT
  • RLVR
  • LoRA

What the JD emphasized

  • 5+ years of experience building and deploying machine learning or AI systems for real-world, production use cases.
  • Deep understanding of modern LLMs, prompt-, context-, and system-level optimization, and agentic system design.
  • Proven proficiency in Python, including writing production-quality, testable, and maintainable code.

Other signals

  • designing, building, and deploying production-ready AI agents
  • combining LLM reasoning, tool use, memory, and control logic
  • scalable, reliable agent architectures
  • productionize frontier agent techniques