Staff AI Engineer - Cortex Code Agentic System

Snowflake Snowflake · Data AI · CA-Menlo Park, United States · Engineering

Staff Machine Learning Engineer focused on building agentic systems and methodology for coding agents at enterprise scale, ensuring efficiency, repeatability, audibility, and shippability. The role involves owning quality pillars, designing experimentation pipelines, leading analysis of quality regressions, cross-functional leadership, and ensuring production-minded rigor for quality systems.

What you'd actually do

  1. Own major pillars of the quality stack: tuning agent behavior to engage on next generation agentic coding tasks.
  2. Design and evolve pipelines and tooling that support large-scale experimentation, error mining, and iteration on prompts/tools/workflows with clear before/after signals.
  3. Lead postmortems on quality regressions; cluster failure modes; translate findings into a prioritized roadmap for engineering and modeling partners.
  4. Align product, infra, and applied AI on what “good” means for critical customer workflows; mentor engineers and uplevel eval craft across the team.
  5. Ensure quality systems are dependable in practice—reproducible runs, stable datasets, versioning, and operational clarity when things drift.

Skills

Required

  • Python
  • TypeScript
  • Go
  • eval harnesses
  • measurement
  • experimentation loops for LLM/agent systems
  • technical direction
  • cross-team delivery
  • mentoring

Nice to have

  • data engineering pipelines (dbt, Airflow)
  • data modeling
  • data analysis
  • retrieval systems
  • semantic layers
  • agentic coding tools
  • LLM observability
  • safety/guardrails
  • quality systems used as release gates

What the JD emphasized

  • Staff-level ownership
  • building and operating eval harnesses, measurement, and/or experimentation loops for LLM/agent systems—not only one-off benchmarks
  • complex quality + data pipelines—substantial state, branching logic, and operational requirements
  • clear metrics, reproducibility, and sustained improvement—not one-off score bumps
  • systematic measurement and team-wide practice

Other signals

  • agentic systems
  • enterprise scale
  • quality stack
  • customer pain
  • metrics
  • experiment loops