Kernel Optimization Software Engineer, AI Hardware

Tesla Tesla · Auto · Palo Alto, CA · Tesla AI

This role focuses on optimizing AI models (research models) to run efficiently on Tesla's custom AI hardware (ASICs) for applications like Autopilot and Optimus. It involves kernel optimization, compiler development, and working with hardware teams to improve inference and training performance, with a focus on real-time latency for robotics and self-driving systems.

What you'd actually do

  1. Implement, optimize, and profile highly performant kernels for inference and training on Tesla's AI and Dojo ASICs
  2. Optimize bottlenecks in the inference flow, make precision/performance tradeoff decisions, and develop novel techniques to improve hardware utilization and throughput
  3. Work on a variety of edge and datacenter workloads, from small encoders/decoders to distributed LLM inference
  4. Work with hardware teams to shape the next generation of Tesla hardware, evaluating architectural tradeoffs and balancing performance with versatility
  5. Research and implement state-of-the-art machine learning techniques to achieve high performance on our hardware

Skills

Required

  • kernel optimization
  • distributed systems
  • inference runtimes
  • serving frameworks
  • LLMs
  • transformers
  • state space models
  • diffusion models
  • CNNs
  • performance modeling
  • roofline analysis
  • computer and GPU architecture
  • SIMD
  • multithreading
  • accelerators with vectorized instructions
  • analytical and debugging skills
  • ability to work across team boundaries

Nice to have

  • MLIR-based compiler stack
  • Contributions to ML serving frameworks
  • compilers
  • frameworks (e.g. SGLang, LLVM, PyTorch, MLIR)

What the JD emphasized

  • real-time latency
  • performance
  • optimize
  • optimization
  • kernels
  • inference
  • training

Other signals

  • optimizing inference and training kernels
  • real-time latency for self-driving and humanoid robots
  • shaping next-generation AI chips