AI Research Scientist, New Grad – Agents & Reinforcement Learning

Snowflake Snowflake · Data AI · WA-Bellevue, United States · Engineering

Research Scientist role focused on building agentic frameworks, auto research agents, coding agents, and multi-agent systems using reinforcement learning techniques. The role involves developing and curating data pipelines for novel agentic and RL research, with a focus on recursive self-improvement and aligning agentic behaviors. The research is intended to be integrated into Snowflake's Data Cloud platform.

What you'd actually do

  1. Design and develop agentic frameworks powered by recursive self-improvement loops, enabling AI systems that iteratively refine their own capabilities and strategies
  2. Build and evaluate auto research agents — systems capable of autonomously formulating hypotheses, executing experiments, and synthesizing findings
  3. Develop coding agents that understand, generate, and debug code across complex, multi-step programming tasks
  4. Conduct research in reinforcement learning with a focus on RLHF, DPO, and PPO as mechanisms for aligning and improving agentic behaviors
  5. Contribute to multi-agent systems where specialized agents collaborate, negotiate, and self-organize to solve enterprise-scale problems

Skills

Required

  • Python
  • PyTorch or JAX
  • Reinforcement Learning
  • LLM post-training
  • Agentic architectures
  • Tool-use
  • Planning
  • Self-correction

Nice to have

  • Coding agents
  • Auto research agents
  • Recursive self-improvement frameworks
  • Automated AI scientist paradigms
  • Large-scale distributed training
  • Efficient training paradigms
  • Mathematical reasoning
  • Structured decision-making
  • Program synthesis
  • Domain-specific AI applications (healthcare, finance, enterprise workflows)

What the JD emphasized

  • PhD in Computer Science, Machine Learning, Artificial Intelligence, or a closely related field (completing or recently completed; or equivalent research experience)
  • Foundational expertise in reinforcement learning algorithms, including RLHF, DPO, PPO, or multi-agent systems
  • Research experience in LLM post-training, fine-tuning, or reasoning model development
  • Demonstrated ability to implement and experiment with agentic architectures — including tool-use, planning, and self-correction loops
  • At least one first-author or co-authored publication or preprint in a relevant AI/ML area

Other signals

  • agentic frameworks
  • recursive self-improvement
  • auto research agents
  • coding agents
  • reinforcement learning
  • multi-agent systems
  • synthetic data
  • human-annotated data