Software Engineer, AI Compute Infrastructure

HeyGen HeyGen · Multimodal · Los Angeles, CA +3 · Engineering

Software Engineer focused on building and scaling the foundational compute infrastructure for AI models, including multimodal training data pipelines and high-throughput, low-latency video generation. Responsibilities include optimizing GPU utilization, developing large-scale AI job frameworks, enhancing observability, accelerating pipelines with AI researchers, and managing cloud/container technologies.

What you'd actually do

  1. Optimize GPU Utilization: Design and implement mechanisms to aggressively optimize GPU and cluster utilization across thousands of devices for inference, training, data processing and large-scale deployment of our state-of-art video generation models.
  2. Develop Large-Scale AI Job Framework: Build highly scalable, reliable frameworks for launching and managing massive, heterogeneous compute jobs, including multi-modal high-volume data ingestion/processing, distributed model training, and continuous evaluation/benchmarking.
  3. Enhance Observability: Develop world-class observability, tracing, and visualization tools for our compute cluster to ensure reliability, diagnose performance bottlenecks (e.g., memory, bandwidth, communication).
  4. Accelerate Pipelines: Collaborate closely with AI researchers and AI engineers to integrate innovative acceleration techniques (e.g., custom CUDA kernels, distributed training libraries) into production-ready, scalable training and inference pipelines.
  5. Infrastructure Management: Champion the adoption and optimization of modern cloud and container technologies (Kubernetes, Ray) for elastic, cost-efficient scaling of our distributed systems.

Skills

Required

  • Python
  • C++
  • Kubernetes
  • Ray
  • PyTorch
  • TensorFlow
  • JAX
  • large-scale MLOps
  • AI infrastructure
  • HPC systems

Nice to have

  • Tech Lead experience
  • Generative AI models infrastructure
  • data infrastructure (Ray, Apache Spark)
  • GPU acceleration
  • CUDA
  • NCCL

What the JD emphasized

  • large-scale MLOps, AI infrastructure, or HPC systems
  • Kubernetes and Ray
  • Generative AI models
  • data infrastructure
  • GPU acceleration
  • CUDA, NCCL

Other signals

  • optimize GPU utilization
  • large-scale AI job framework
  • observability, tracing, and visualization tools
  • accelerate pipelines
  • Kubernetes, Ray