Senior Software Development Engineer, Aws Mantle

Amazon Amazon · Big Tech · Seattle, WA · Software Development

Senior Software Development Engineer to build and scale the distributed inference engine for Amazon Bedrock, powering enterprise access to foundation models globally. The role involves designing, building, and operating high-performance systems for ML inference at massive scale, focusing on request routing, load balancing, model lifecycle management, and performance optimization across AWS regions.

What you'd actually do

  1. Design, build, and operate high-performance distributed systems that serve ML inference at massive scale across all AWS regions
  2. Own the end-to-end delivery of complex features—from requirements through design, implementation, testing, deployment, and production operations
  3. Collaborate with cross-functional teams to solve challenging problems in capacity management, model serving, and API compatibility
  4. Contribute to a culture of engineering excellence by writing clean, maintainable code and driving continuous improvement in system reliability
  5. Influence technical direction within your team while contributing to broader architectural discussions across Mantle and Amazon Bedrock

Skills

Required

  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language experience
  • 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Bachelor's degree in Computer Science, Engineering, a related field, or equivalent experience

Nice to have

  • Master's degree in computer science, machine learning, engineering, or related fields
  • Experience in machine learning, data mining, information retrieval, statistics or natural language processing, or experience in developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware
  • Experience designing APIs at scale, particularly RESTful or streaming APIs with strict latency and availability requirements

What the JD emphasized

  • distributed inference engine
  • massive scale
  • global scale
  • foundation models
  • ML infrastructure
  • AWS regions
  • performance SLAs
  • GPU/accelerator fleet
  • Zero Operator Access (ZOA) security guarantees

Other signals

  • distributed inference engine
  • massive scale
  • global scale
  • foundation models
  • ML infrastructure
  • AWS regions
  • performance SLAs
  • GPU/accelerator fleet