Hardware / Software Codesign Engineer - 3p

OpenAI OpenAI · AI Frontier · San Francisco, CA · Scaling

This role focuses on co-designing hardware with vendors to optimize it for AI workloads, specifically for training and inference of large language models. It involves understanding ML techniques, algorithms, and numerical approximations to influence future hardware architectures and improve performance. The engineer will also build system performance models and evaluate potential accelerators.

What you'd actually do

  1. Co-design future hardware for programmability and performance with our hardware vendors
  2. Assist hardware vendors in developing optimal kernels and add support for it in our compiler
  3. Develop performance estimates for critical kernels for different hardware configurations and drive decisions on compute core and memory hierarchy features
  4. Build system performance models at different abstraction levels and carry out analysis to drive decisions on scale up, scale out, front end networking
  5. Work with machine learning engineers, kernel engineers and compiler developers to understand their vision and needs from high performance accelerators

Skills

Required

  • 4+ years of industry experience
  • experience harnessing compute at scale
  • optimizing ML platform code to run efficiently on target hardware
  • Strong experience in software/hardware co-design
  • Deep understanding of GPU and/or other AI accelerators
  • Experience with CUDA, Triton or a related accelerator programming language
  • Experience driving Machine Learning accuracy with low precision formats
  • Experience with system performance modeling and analysis to optimize ML model deployment
  • Strong coding skills in C/C++ and Python
  • familiar with the fundamentals of deep learning computing and chip architecture/microarchitecture
  • Able to actively collaborate with ML engineers, kernel writers, compiler developers, system engineers, chip architects/microarchitects

Nice to have

  • PhD in Computer Science and Engineering with a specialization in Computer Architecture, Parallel Computing. Compilers or other Systems
  • Strong understanding of LLMs and challenges related to their training and inference

What the JD emphasized

  • harnessing compute at scale
  • optimizing ML platform code to run efficiently on target hardware
  • software/hardware co-design
  • GPU and/or other AI accelerators
  • CUDA, Triton or a related accelerator programming language
  • Machine Learning accuracy with low precision formats
  • system performance modeling and analysis to optimize ML model deployment
  • deep learning computing
  • chip architecture/microarchitecture
  • LLMs and challenges related to their training and inference

Other signals

  • co-design hardware for AI workloads
  • optimize ML platform code for hardware
  • efficiently distributing LLMs
  • tailoring compute pipe and memory hierarchy
  • optimizing training and inference