Security Engineer, Detection and Response (us)

Writer Writer · AI Frontier · San Francisco, CA · Engineering, product & design

Security engineer focused on detection and response for AI infrastructure, including AI-specific threats like prompt injection and data poisoning. The role involves building automated response systems, leading incident response for AI infrastructure, proactive threat hunting across GPU clusters and training environments, and developing detection-as-code frameworks. It requires collaboration with AI Security research, Cloud Infrastructure, and AI researchers to protect the AI platform.

What you'd actually do

  1. Design and implement detection strategies that identify AI-specific threats including prompt injection, model extraction, data poisoning, adversarial examples, and unauthorized access to training datasets or model weights across our distributed infrastructure
  2. Build automated response playbooks and orchestration workflows that contain threats without human intervention, creating self-healing security systems that reduce mean time to response from hours to minutes while automatically remediating compromised inference endpoints
  3. Lead security incident response coordination across all teams (Cloud, AppSec, Enterprise, AI Security) when AI infrastructure or models are compromised, conducting forensic investigations on training pipeline attacks and model manipulation attempts while drafting clear incident communications for engineering and executive leadership
  4. Hunt proactively for sophisticated threats across GPU clusters and training infrastructure by analyzing model outputs for signs of compromise, reproducing AI-specific vulnerabilities from security research, and identifying visibility gaps in distributed training environments before adversaries exploit them
  5. Build detection-as-code frameworks with version control and automated deployment, onboard telemetry from AI training infrastructure and inference endpoints, and create dashboards that track model security metrics, GPU utilization patterns, and access to sensitive research data

Skills

Required

  • 3-5+ years in security operations, detection engineering, or incident response
  • Proven track record of identifying and stopping sophisticated attacks in production environments
  • Securing AI/ML infrastructure, high-performance computing environments, or other distributed systems at scale
  • Strong programming skills in Python, KQL, SPL, or similar languages
  • Build custom detection logic
  • Automate response workflows
  • Create tools that operationalize security at scale across cloud-native and distributed computing environments
  • Experience with SIEM platforms, detection technologies, and forensic investigation techniques
  • Demonstrated ability to build detection for novel attack techniques
  • Conduct forensics in complex distributed environments
  • Self-directed execution mindset
  • Track record of securing high-value intellectual property
  • Automating incident response in complex environments
  • Identifying critical security gaps through proactive threat hunting

Nice to have

  • AI Security research collaboration
  • Cloud Infrastructure collaboration
  • Software Security Engineering collaboration
  • AI researchers collaboration
  • Threat intelligence translation
  • Incident response coordination
  • Forensic investigations on training pipeline attacks
  • Model manipulation attempts
  • Incident communications
  • Analyzing model outputs for signs of compromise
  • Reproducing AI-specific vulnerabilities
  • Detection-as-code frameworks
  • Version control
  • Automated deployment
  • Onboard telemetry from AI training infrastructure
  • Onboard telemetry from inference endpoints
  • Dashboards for model security metrics
  • Dashboards for GPU utilization patterns
  • Dashboards for access to sensitive research data
  • Operational security partner
  • Monitoring Cloud Infrastructure's GPU clusters for threats
  • Detecting customer-impacting incidents
  • Enabling responsible AI development through security guardrails
  • 24/7 on-call rotation
  • Responding to real-time threats
  • Continuously improving detection coverage
  • Continuously improving automation capabilities

What the JD emphasized

  • AI-specific threats
  • automated response capabilities
  • defending cutting-edge AI/AGI systems
  • securing systems that are fundamentally different
  • AI security engineering at scale
  • novel threats that don't exist in textbooks yet
  • sophisticated attacks across GPU clusters and distributed training environments
  • AI infrastructure or models are compromised
  • model outputs for signs of compromise
  • visibility gaps in distributed training environments
  • customer-impacting incidents
  • critical AI security incidents
  • securing AI/ML infrastructure
  • high-performance computing environments
  • complex distributed environments
  • securing high-value intellectual property
  • automating incident response in complex environments
  • identifying critical security gaps
  • unwavering accountability

Other signals

  • AI-specific threats
  • automated response capabilities
  • defending cutting-edge AI/AGI systems
  • securing systems that are fundamentally different
  • AI security engineering at scale