Senior Security Production Engineer

Weights & Biases Weights & Biases · Data AI · Bellevue, WA +4 · Technology

CoreWeave is seeking a Senior Security Production Engineer to build, scale, and maintain the secure infrastructure for their AI cloud platform. This role involves designing and operating security systems, automating processes, enhancing observability, and responding to incidents, with a focus on reliability and performance for AI workloads.

What you'd actually do

  1. Design, implement, and maintain scalable, highly available security infrastructure using Kubernetes and cloud native technologies
  2. Build automation and monitoring solutions to proactively identify and mitigate reliability risks
  3. Collaborate with engineering teams to optimize system performance, reduce latency, and improve service uptime
  4. Participate in incident response, conduct root cause analysis, and implement preventative solutions
  5. Mentor team members and promote best practices in reliability, security engineering, and infrastructure management

Skills

Required

  • 5+ years of experience in site reliability engineering, DevOps, security engineering, security operations, or related roles
  • Strong proficiency with Kubernetes, container orchestration, and cloud native technologies
  • Experience managing and operating Teleport for infrastructure access control
  • Proficiency in automation and scripting languages such as Python, Bash, or Go
  • Experience operating and maintaining large scale distributed systems with a focus on reliability

Nice to have

  • Familiarity with observability platforms such as Prometheus, Grafana, or Datadog
  • Experience working with cloud providers such as AWS, Azure, or GCP

What the JD emphasized

  • security infrastructure
  • AI cloud platform
  • AI workloads at scale
  • reliability risks
  • system reliability and performance
  • security engineering
  • large scale distributed systems with a focus on reliability