Full Stack Engineer, Fleet Scheduling

OpenAI OpenAI · AI Frontier · San Francisco, CA · Scaling

Full Stack Engineer to build and operate web-based systems for managing AI workloads on supercomputing clusters, focusing on researcher productivity and system transparency.

What you'd actually do

  1. Design and develop full-stack web applications to track, monitor, and manage large-scale AI workloads in real time.
  2. Collaborate with researchers and infrastructure teams to translate complex operational needs into intuitive UIs and scalable backends.
  3. Build data visualization tools (e.g., Gantt charts, dashboards) to provide insights into job scheduling and resource allocation.
  4. Optimize backend services to handle massive data throughput while ensuring low-latency performance and high availability.
  5. Implement frontend components that provide seamless interactions with scheduling, storage, and compute systems.

Skills

Required

  • full-stack development
  • modern frontend frameworks (React, Vue, or Angular)
  • backend technologies (Python, Go, or Node.js)
  • building scalable, high-performance web applications for complex distributed systems
  • RESTful and GraphQL APIs
  • distributed databases
  • cloud infrastructure (especially Azure)

Nice to have

  • Kubernetes
  • Docker
  • cloud-native application deployment
  • AI/ML workload scheduling and orchestration challenges
  • real-time data processing
  • visualization libraries
  • observability tooling

What the JD emphasized

  • large-scale AI workloads
  • supercomputing clusters
  • scalable solutions
  • exascale workloads
  • low-latency performance