Staff Software Engineer, Generative Ai, Core ML

Google Google · Big Tech · Mountain View, CA +1

Staff Software Engineer role focused on architecting and implementing Agentic Reinforcement Learning systems. This involves developing novel RL recipes, reward modeling, and synthetic data for reasoning, planning, and tool use in large-scale AI products. The role bridges frontier research with production, focusing on the "GenAI Engineering Gap" to create self-improving agentic systems.

What you'd actually do

  1. Architect and implement advanced Reinforcement Learning (RL) workflows for complex, multi-turn agentic tasks. Develop novel training recipes for reasoning, self-correction, and tool use (e.g., CoT, Tree of Thoughts) to improve model reliability in long-horizon workflows.
  2. Design robust reward systems and simulation environments ("Digital Twins") to evaluate and train agents.
  3. Create the "Intelligence Assets" required to train specialized student models, bridging the gap between generalist teacher models and domain-specific production requirements.
  4. Contribute to the unified middleware layer that democratizes access to state-of-the-art tuning. Implement efficient adaptation techniques (e.g., LoRA, Distillation, Quantization) to ensure high-performance agents can be deployed under strict latency and cost constraints.
  5. Partner with Google DeepMind researchers to validate novel algorithmic approaches (e.g., outcome-supervised vs. process-supervised RMs) and scale them from 0-to-1 prototypes into 1-to-N production libraries used across Google.

Skills

Required

  • Python
  • JAX
  • PyTorch
  • Reinforcement Learning (RLHF, RLAIF)
  • LLM post-training techniques (SFT, DPO, PPO)
  • Large Language Models (LLMs)
  • Multi-Modal
  • Large Vision Models
  • model deployment
  • model evaluation
  • data processing
  • debugging
  • fine tuning

Nice to have

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field
  • 8 years of experience with data structures and algorithms
  • 3 years of experience in a technical leadership role leading project teams and setting technical direction
  • multimodal learning
  • embodied agents
  • reward modeling
  • dense/mixture of experts (MoE) architectures
  • hybrid reward systems

What the JD emphasized

  • 8 years of experience in software development
  • 5 years of experience leading ML design and optimizing ML infrastructure
  • 2 years of experience with GenAI techniques
  • Experience in Reinforcement Learning (RLHF, RLAIF) and LLM post-training techniques (SFT, DPO, PPO)
  • Experience building efficient evaluation harnesses, benchmarks, or simulation environments for measuring agent performance
  • Proven track record (publications or production launches) in reward modeling

Other signals

  • architecting the technical bridge between frontier research and massive-scale product deployment
  • pioneer the next generation of Agentic Reinforcement Learning
  • architect the "cognitive" layer of Google’s AI stack
  • translate frontier research into scalable production infrastructure
  • solving the "GenAI Engineering Gap" by transforming probabilistic models into reliable, self-improving agentic systems