Data Scientist, Agentic Systems (remote)

CrowdStrike CrowdStrike · Enterprise · United States · Remote

This role focuses on building the next generation of agentic systems for cybersecurity. It involves post-training LLMs and agents using techniques like RLHF/RLAIF, PPO/GRPO/DPO, and reward modeling. The role also requires devising AI agents, combining them into complex workflows with planning, reasoning, tool/function calling, and retrieval/memory. Researching new agentic planning approaches and establishing evaluation criteria for agentic systems are key responsibilities. Optimization of prompts and inference, collaboration across teams, and staying updated on AI developments are also crucial.

What you'd actually do

  1. Post-train LLMs and agents — supervised fine-tuning and reinforcement learning (RLHF/RLAIF, PPO/GRPO/DPO, reward modeling) — to automate analyst procedures and improve reliability on real security tasks
  2. Devise AI agents and combine them into increasingly complex workflows: planning and reasoning loops, tool and function calling, and retrieval and memory
  3. Establish objective criteria for benchmarking agentic systems — evals, LLM-as-judge pipelines, and trajectory-level metrics, with real statistical rigor
  4. Research new approaches to agentic planning, and prototype state-of-the-art methods from the literature
  5. Optimize prompts and inference to get the most out of every model

Skills

Required

  • Excellent foundations in machine learning, probability, and statistics
  • PhD-level depth of understanding in modern machine learning research
  • Experience training generative models, with a strong command of LLM training fundamentals
  • Reinforcement learning / post-training as a core skill: RLHF/RLAIF, policy optimization (PPO/GRPO/DPO), reward modeling, and building RL environments for agents
  • Experience building agentic systems: agent architectures (ReAct, planning, reflection), tool and function calling, and retrieval/memory/context management
  • Experience with systematic prompt optimization, and with designing and building evals for LLM systems
  • Fluency with GPUs, PyTorch, and the common LLM training and serving stack (e.g., Hugging Face Transformers/TRL/PEFT, DeepSpeed/FSDP, vLLM/TGI/SGLang)
  • Strong, reproducible research engineering: clean Python and disciplined experiment tracking

Nice to have

  • Experience generating training data and environments — synthetic data, agent trajectories/rollouts, and task simulators
  • Familiarity with inference-time scaling / test-time compute (search, self-consistency, verifier-guided decoding, long chain-of-thought)
  • Experience with agent safety and guardrails: sandboxing, abuse/jailbreak resistance, and reliability for autonomous systems
  • A knack for interpretability and failure analysis
  • Notable open-source contributions and excellent technical writing
  • Passionate about cybersecurity
  • An independent self-starter who likes to take ownership and seeks out new challenges

What the JD emphasized

  • PhD-level depth of understanding in modern machine learning research
  • Reinforcement learning / post-training as a core skill
  • Experience building agentic systems

Other signals

  • building agentic systems
  • post-training LLMs and agents
  • research new approaches to agentic planning