Software Development Engineer I, ML Infra Services, Annapurna Labs

Amazon Amazon · Big Tech · Cupertino, CA · Software Development

Software Development Engineer I role focused on building and evolving machine learning infrastructure services and tooling for AWS Neuron AI accelerators. The role involves designing and implementing solutions for profiling, optimization, and resource management of ML workloads, working across the stack from infrastructure orchestration to developer-facing tools.

What you'd actually do

  1. Design and implement tooling for profiling, optimization, and resource management of ML workloads on custom accelerators.
  2. Build high-impact solutions that ship to a large and growing customer base.
  3. Participate in design discussions, code reviews, and cross-functional collaboration with hardware, software, and customer-facing teams.
  4. Create metrics, implement automation, and resolve root causes of software defects.
  5. Work in a startup-like environment where you're always focused on the most important problems.

Skills

Required

  • Experience with at least one modern language such as Java, Python, C++, or C# including object-oriented design
  • Experience with at least one general-purpose programming language such as Java, Python, C++, C#, Go, Rust, or TypeScript
  • Experience with data structure implementation, basic algorithm development, and/or object-oriented design principles
  • Proficiency in Java and at least one of Go, Python, or TypeScript.
  • Familiarity with Git and CI/CD pipelines.

Nice to have

  • Experience from a technical internship
  • Experience in optimization mathematics such as linear programming and nonlinear optimization
  • Experience with distributed, multi-tiered systems, algorithms, and relational databases
  • Work 40 hours/week, and overtime as required
  • Experience from previous technical internship(s) or demonstrated project experience
  • Experience with Cloud platforms (preferably AWS), database systems (SQL and NoSQL), AI tools for development productivity, contributing to open-source projects, and/or version control systems
  • Internship or project experience with AWS services (EKS, EC2, Lambda, S3, DynamoDB, or SQS).
  • Familiarity with distributed systems or big data architectures.
  • Experience with Linux systems and performance profiling.
  • Exposure to compiler toolchains, code generation, or instruction set architectures (CPU, NPU, GPU).

What the JD emphasized

  • custom AI accelerators
  • ML workloads
  • custom accelerators
  • ML workloads
  • custom silicon
  • large-scale ML workloads

Other signals

  • ML accelerators
  • ML infra services
  • ML tooling