Staff Software Engineer, Deep Learning Acceleration

Aurora Innovation Aurora Innovation · Robotics · Mountain View, CA · Software Autonomy Sensing

Staff Software Engineer focused on Deep Learning Acceleration for Aurora's Autonomous Vehicle (AV) systems. The role involves performance analysis and optimization of deep learning networks for both onboard vehicle deployment and large-scale data center training. Responsibilities include optimizing software architecture, system performance, and latency, troubleshooting using profiling and roofline models, and collaborating with cross-functional teams. Requires strong CUDA, C++, Python skills, high-performance computing experience, and proficiency with deep learning frameworks and performance analysis tools.

What you'd actually do

  1. Conduct performance analysis and optimization of Deep Learning networks running on the Autonomous Vehicle (AV).
  2. Optimize software architecture, system performance, and latency for deep learning applications.
  3. Work on deployment of deep learning models on the AV and training on large-scale data centers.
  4. Troubleshoot performance issues using profiling and roofline model techniques.
  5. Collaborate with cross-functional teams to enhance the efficiency of self-driving technology.

Skills

Required

  • CUDA
  • C++
  • Python
  • high-performance computing
  • parallel programming
  • GPU memory optimization
  • latency minimization
  • throughput maximization
  • NVIDIA Nsight Systems
  • NVIDIA Nsight Compute
  • roofline model
  • PyTorch
  • TensorFlow
  • computer vision
  • transformer-based deep learning architectures
  • neural network building blocks
  • performance bottleneck diagnosis
  • Linux/Unix environments

Nice to have

  • motion planning
  • robotics
  • autonomous systems
  • systems software
  • TensorRT
  • OpenAI Triton
  • Mojo

What the JD emphasized

  • performance analysis and optimization
  • software architecture, system performance, and latency
  • deployment of deep learning models
  • troubleshoot performance issues
  • profiling and roofline model techniques

Other signals

  • performance optimization
  • deep learning acceleration
  • autonomous vehicle systems
  • large-scale data centers