Software Engineer, Infrastructure Security

OpenAI OpenAI · AI Frontier · United States · Remote · Security

Software Engineer, Infrastructure Security at OpenAI. This role focuses on safeguarding OpenAI's core research and production environments, including GPU supercomputing clusters, multi-cloud infrastructure, and critical services. The engineer will design and implement production-grade security services like authentication systems, proxies, and key management platforms, ensuring high reliability and scalability. Responsibilities include partnering with engineers to embed security in compute clusters, developing automation for risk mitigation, and leading initiatives for AI workloads. Requires strong software engineering skills, experience in critical security infrastructure, cloud security expertise, and familiarity with container orchestration and modern authentication standards.

What you'd actually do

  1. Architect and implement production-grade security services (e.g., auth services, access brokers, secure proxies, key-management infrastructure) that provide strong guarantees across hardware, operating systems, Kubernetes, networks, and CI/CD.
  2. Partner with infrastructure and research engineers to embed security into high-performance compute clusters, enabling rapid model training and deployment without compromising protection.
  3. Develop automation and detection tooling to continuously identify and mitigate risks in large-scale cloud and on-prem environments.
  4. Drive high-impact initiatives such as line-speed encryption, machine identity, and network isolation, continuously raising the security bar for emerging AI workloads.
  5. Lead or participate in design reviews and threat models to ensure new systems launch with strong security foundations and operational excellence.

Skills

Required

  • Strong software engineering skills in languages such as Python, Go, Rust, or C/C++
  • track record of shipping and operating high-reliability distributed services
  • Experience building or operating critical security infrastructure (e.g., auth services, service-to-service proxies, certificate or key-management systems)
  • Deep understanding of security principles, best practices, and common vulnerabilities
  • Expertise in securing large-scale cloud platforms (e.g., Azure, AWS, GCP), including multi-cloud networks and cloud-agnostic system design
  • Familiarity with container and orchestration security (Kubernetes, service meshes) and modern authentication/authorization standards (OIDC, mTLS, SPIFFE/SPIRE)
  • A proactive mindset, with the ability to identify and address security gaps or inefficiencies through automation and tooling
  • A track record of delivering scalable solutions and driving impactful changes across infrastructure in real-world projects
  • Strong analytical and problem-solving skills, with an ability to think critically and objectively assess security risks
  • Excellent communication skills, with the ability to convey complex security concepts to technical and non-technical stakeholders

Nice to have

  • Excitement about collaborating with cross-functional teams to build secure, reliable systems that scale globally.

What the JD emphasized

  • high standards of reliability
  • scalability
  • software craftsmanship
  • robust under intense scale and adversarial pressure
  • production-grade security services
  • high-performance compute clusters
  • rapid model training and deployment
  • large-scale cloud and on-prem environments
  • line-speed encryption
  • machine identity
  • network isolation
  • emerging AI workloads
  • strong security foundations
  • operational excellence
  • Strong software engineering skills
  • shipping and operating high-reliability distributed services
  • critical security infrastructure
  • large-scale cloud platforms
  • multi-cloud networks
  • container and orchestration security
  • modern authentication/authorization standards
  • delivering scalable solutions
  • driving impactful changes across infrastructure
  • real-world projects
  • Strong analytical and problem-solving skills
  • think critically and objectively assess security risks
  • Excellent communication skills
  • convey complex security concepts
  • build secure, reliable systems that scale globally