Director, Model Post-training and Agentic Research (remote)

CrowdStrike CrowdStrike · Enterprise · United States · Remote

Lead research and development for post-training and agentic AI systems in the cybersecurity domain. This role involves owning the full post-training stack (SFT, reward modeling, RLHF/RLAIF) and designing/building agent harnesses for complex cyber workflows, including tool use, planning, and multi-step execution. The position requires a blend of research leadership and hands-on technical contribution, with a focus on rigorous evaluation and driving innovation in security-specialized AI.

What you'd actually do

  1. Own and personally drive the full post-training pipeline for security-domain AI — SFT, RLHF/RLAIF, agent-RL, and reward modeling. Set research priorities and architectural direction, and lead experimental work on the hardest problems yourself rather than delegating them away. Design reward modeling methodology grounded in verified security outcomes rather than proxy signals, drawing on both human expert feedback and automated adversarial evaluation. Define data curation standards across sourcing, filtering, quality scoring, and domain weighting that drive measurable capability improvement.
  2. Build and maintain agent-RL training environments that simulate realistic cyber workflows (multi-step offensive and defensive tasks, tool use, and long-horizon planning) contributing directly to environment design and reward shaping. Lead the design and build of the agent harnesses that run on top of those trained models: scaffolding architecture, tool-calling interfaces, planning and reasoning loops, and memory and context management. Treat harness design with the same rigor as the training pipeline; these systems determine whether strong post-training translates into reliable, trustworthy behavior in the field.
  3. Develop and own evaluation methodology for the full agentic stack, not model capability in isolation, but harness behavior, tool-use reliability, planning coherence, and end-to-end task completion across realistic security workflows. Define the benchmarks, red-line tests, and measurement practices that give the team and the organization genuine confidence that an agent works.
  4. Partner closely with other teams to ensure post-training and agentic work integrates cleanly with the broader model development loop. Contribute original research through publications, external presentations, and open-source artifacts where appropriate, building CrowdStrike's credibility as a research-first organization in this space.
  5. Recruit, develop, and retain a high-density team of research scientists and ML engineers. Set a technical bar through your own contributions, not just your standards.

Skills

Required

  • MS or PhD in computer science, machine learning, or a related quantitative discipline
  • 8+ years of experience in ML research or engineering, with meaningful depth in large language model post-training
  • Hands-on expertise across the modern post-training stack, including SFT data pipelines, RLHF/RLAIF, PPO or similar RL algorithms applied to language models, and reward model design and training
  • Demonstrated experience designing or building agentic system harnesses for LLM-based agents, including tool-use frameworks, planning scaffolds, multi-step execution environments, and context or memory management
  • Strong evaluation instincts: experience designing evaluation protocols that are resistant to overfitting, capable of measuring genuine capability improvement, and interpretable to both technical and non-technical stakeholders
  • Track record of running high-velocity research programs with disciplined tracking and fast iteration
  • Proven ability to lead and grow research teams while remaining a credible, active technical contributor

Nice to have

  • building o

What the JD emphasized

  • own the full post-training stack
  • agentic research
  • designing, building, and evaluating the harnesses
  • own the full post-training pipeline
  • lead experimental work on the hardest problems yourself
  • Build and maintain agent-RL training environments
  • Lead the design and build of the agent harnesses
  • own evaluation methodology for the full agentic stack
  • 8+ years of experience in ML research or engineering, with meaningful depth in large language model post-training
  • Hands-on expertise across the modern post-training stack
  • Demonstrated experience designing or building agentic system harnesses for LLM-based agents

Other signals

  • post-training
  • agentic research
  • reinforcement learning
  • security domain