Staff Machine Learning Compiler Engineer

Rivian Rivian · Auto · Palo Alto, CA · Mechanical & Electrical Engineering

Staff Machine Learning Compiler Engineer to develop an ML Compiler for mapping Autonomy ML models to Rivian Autonomy Processor (RAP1), focusing on hardware-aware optimizations for inference performance.

What you'd actually do

  1. Lead the development of an ML Compiler for mapping Autonomy ML models to Rivian Autonomy Processor (RAP1).
  2. Design and implement hardware-aware optimizations, including quantization strategies, model compression, memory-efficient representations, and operator fusion, targeted to RAP1.
  3. Collaborate with hardware teams to co-optimize model architecture and compute pipeline under real-time constraints (latency, throughput, power).
  4. Benchmark and analyze system performance across platforms and iterate to achieve optimal deployment efficiency.
  5. Partner with autonomy teams to align model optimization efforts with hardware roadmap and real-world autonomy requirements.

Skills

Required

  • Ph.D. or M.S. in Computer Engineering or a related field.
  • Excellent C/C++ and Python programming skills.
  • Experience with various SOC platforms used for machine learning.
  • Strong understanding of deep learning software models.
  • Proficiency in deep learning frameworks and their low-level IRs or export formats.

Nice to have

  • Experience in compiler pipeline development preferred.
  • Experience working in aggressive design environments is preferred.
  • Prior experience working with hardware-software co-design, especially for autonomous or robotics platforms.
  • Deep knowledge of numerical precision trade-offs, quantization-aware training (QAT), and dynamic/static quantization flows.
  • Familiarity with embedded real-time constraints and hardware profiling/debugging tools.
  • Familiarity with rearchitecting models to best suit hardware capabilities.

What the JD emphasized

  • hardware-aware optimizations
  • quantization strategies
  • model compression
  • memory-efficient representations
  • operator fusion
  • real-time constraints
  • latency
  • throughput
  • power
  • system performance
  • deployment efficiency
  • hardware roadmap
  • real-world autonomy requirements
  • compiler pipeline development
  • hardware-software co-design
  • autonomous or robotics platforms
  • numerical precision trade-offs
  • quantization-aware training (QAT)
  • dynamic/static quantization flows
  • embedded real-time constraints
  • hardware profiling/debugging tools
  • rearchitecting models

Other signals

  • ML Compiler
  • inference
  • hardware-software co-design
  • performance optimization