Tech Lead, AI Compute Infrastructure

HeyGen HeyGen · Multimodal · Los Angeles, CA +3 · Engineering

Tech Lead for AI Compute Infrastructure at HeyGen, focusing on building and scaling the platform for generative video models. Responsibilities include optimizing GPU utilization, developing large-scale AI job frameworks, enhancing observability, accelerating pipelines with custom techniques, and managing cloud/container technologies like Kubernetes and Ray. Requires 5+ years in MLOps/AI infrastructure/HPC, experience with Ray, Spark, Python, C++, Kubernetes, and ML frameworks.

What you'd actually do

  1. Optimize GPU Utilization: Design and implement mechanisms to aggressively optimize GPU and cluster utilization across thousands of devices for inference, training, data processing and large-scale deployment of our state-of-art video generation models.
  2. Develop Large-Scale AI Job Framework: Build highly scalable, reliable frameworks for launching and managing massive, heterogeneous compute jobs, including multi-modal high-volume data ingestion/processing, distributed model training, and continuous evaluation/benchmarking.
  3. Enhance Observability: Develop world-class observability, tracing, and visualization tools for our compute cluster to ensure reliability, diagnose performance bottlenecks (e.g., memory, bandwidth, communication).
  4. Accelerate Pipelines: Collaborate closely with AI researchers and AI engineers to integrate innovative acceleration techniques (e.g., custom CUDA kernels, distributed training libraries) into production-ready, scalable training and inference pipelines.
  5. Infrastructure Management: Champion the adoption and optimization of modern cloud and container technologies (Kubernetes, Ray) for elastic, cost-efficient scaling of our distributed systems.

Skills

Required

  • Python
  • C++
  • Kubernetes
  • Ray
  • PyTorch
  • TensorFlow
  • JAX
  • MLOps
  • AI infrastructure
  • HPC systems

Nice to have

  • Master's or PhD
  • Tech Lead experience
  • Generative AI models infrastructure
  • data infrastructure
  • GPU acceleration
  • CUDA
  • NCCL

What the JD emphasized

  • state-of-art video generation models
  • large-scale AI job framework
  • observability
  • AI researchers
  • AI engineers
  • large-scale MLOps
  • AI infrastructure
  • HPC systems
  • Generative AI models
  • data infrastructure
  • multi-modal data

Other signals

  • scaling AI infrastructure
  • GPU utilization
  • low-latency video generation
  • large-scale AI job framework
  • observability for compute cluster