Software Development Engineer, Neuron Foundation Tools

Amazon Amazon · Big Tech · Seattle, WA · Software Development

Software Development Engineer for AWS Neuron Foundation Tools Team, responsible for developing and maintaining high-performance monitoring and profiling tools for AI accelerators (Inferentia, Trainium). Focus on optimizing AI workloads by providing insights into performance bottlenecks and system behavior, improving ML Kernels and Frameworks. Manages the full development life cycle of the Neuron Profiler/Tools toolchain, collaborating with cross-functional teams on C++ compiler and runtime, and supporting frameworks like PyTorch, JAX, and XLA.

What you'd actually do

  1. develop and maintain high-performance monitoring and profiling tools for machine learning applications and AI accelerators
  2. work on design, development, and deployment of the Neuron Profiler and other Neuron Tools
  3. managing the full development life cycle of the Neuron Profiler/Tools toolchain, ensuring scalability, reliability, and usability
  4. collaborate with cross-functional teams to ensure that the our C++ compiler and runtime generates key information so customers can understand and optimize the performance of our custom hardware
  5. drive innovations that allow the profiler to support multiple frameworks, such as PyTorch, JAX, and XLA

Skills

Required

  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Experience programming with at least one software programming language

Nice to have

  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Bachelor's degree in computer science or equivalent
  • ML-specific profiler tools (like PyTorch Profiler or TensorFlow Profiler)
  • direct customer-facing experience
  • strong motivation to achieve results

What the JD emphasized

  • high-performance
  • optimizing AI workloads
  • performance bottlenecks
  • Improving performance
  • scalability
  • reliability
  • usability
  • performance of our custom hardware

Other signals

  • AWS Neuron
  • AI accelerators
  • Inferentia
  • Trainium
  • Neuron Profiler
  • performance bottlenecks
  • ML Kernels
  • ML Frameworks
  • C++ compiler
  • runtime
  • PyTorch
  • JAX
  • XLA