Senior ML Engineer, Fauna at Amazon

What you'd actually do

Design and build scalable ML training infrastructure, including distributed training pipelines and GPU cluster management both in the cloud and on-prem

Develop systems for experiment tracking, model versioning, and reproducibility

Build deployment infrastructure for serving ML models on robotic hardware with strict latency requirements

Optimize model inference for edge devices and embedded systems

Collaborate with research teams to accelerate the path from experimentation to production

Skills

Required

5+ years of non-internship professional software development experience
5+ years of programming with at least one software programming language experience
5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience as a mentor, tech lead or leading an engineering team
Bachelor's degree or above in computer science, machine learning, engineering, or related fields, or Master's degree
Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution, or experience in development in the last 3 years
Experience with machine learning (ML) tools and methods
Experience in Kubernetes, Docker or containers ecosystem, or experience that includes strong analytical skills, attention to detail, and effective communication abilities and experience with programming/scripting (Batch, VB, PowerShell, Java, C#, Chef, Perl, Ruby and/or PHP)

Nice to have

Experience building and operating a cloud-based architecture
Experience with robotics data (sensor streams, video, point clouds) and real-time inference systems
Familiarity with model optimization techniques (quantization, pruning, distillation)
Experience with reinforcement learning or simulation-based training pipelines

We are seeking a Senior ML Engineer to build and scale the machine learning systems that power our intelligent robots. In this role, you will design and maintain the infrastructure for training, evaluating, and deploying the ML models that enable robot locomotion, perception, manipulation, navigation, and human-robot interaction.

You'll work at the intersection of machine learning and systems engineering, ensuring our ML training and deployment systems are robust, efficient, and scalable as we grow from prototype to production.

Key job responsibilities

Design and build scalable ML training infrastructure, including distributed training pipelines and GPU cluster management both in the cloud and on-prem
Develop systems for experiment tracking, model versioning, and reproducibility
Build deployment infrastructure for serving ML models on robotic hardware with strict latency requirements
Optimize model inference for edge devices and embedded systems
Collaborate with research teams to accelerate the path from experimentation to production
Contribute to data pipelines and labeling infrastructure as needed, in partnership with the data platform team

Basic Qualifications

5+ years of non-internship professional software development experience
5+ years of programming with at least one software programming language experience
5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience as a mentor, tech lead or leading an engineering team
Bachelor's degree or above in computer science, machine learning, engineering, or related fields, or Master's degree
Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution, or experience in development in the last 3 years
Experience with machine learning (ML) tools and methods
Experience in Kubernetes, Docker or containers ecosystem, or experience that includes strong analytical skills, attention to detail, and effective communication abilities and experience with programming/scripting (Batch, VB, PowerShell, Java, C#, Chef, Perl, Ruby and/or PHP)

Preferred Qualifications

Experience building and operating a cloud-based architecture
Experience with robotics data (sensor streams, video, point clouds) and real-time inference systems
Familiarity with model optimization techniques (quantization, pruning, distillation)
Experience with reinforcement learning or simulation-based training pipelines

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.

USA, NY, New York - 184,900.00 - 250,200.00 USD annually

Key job responsibilities

Design and build scalable ML training infrastructure, including distributed training pipelines and GPU cluster management both in the cloud and on-prem
Develop systems for experiment tracking, model versioning, and reproducibility
Build deployment infrastructure for serving ML models on robotic hardware with strict latency requirements
Optimize model inference for edge devices and embedded systems
Collaborate with research teams to accelerate the path from experimentation to production
Contribute to data pipelines and labeling infrastructure as needed, in partnership with the data platform team

Basic Qualifications

5+ years of non-internship professional software development experience
5+ years of programming with at least one software programming language experience
5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience as a mentor, tech lead or leading an engineering team
Bachelor's degree or above in computer science, machine learning, engineering, or related fields, or Master's degree
Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution, or experience in development in the last 3 years
Experience with machine learning (ML) tools and methods
Experience in Kubernetes, Docker or containers ecosystem, or experience that includes strong analytical skills, attention to detail, and effective communication abilities and experience with programming/scripting (Batch, VB, PowerShell, Java, C#, Chef, Perl, Ruby and/or PHP)

Preferred Qualifications

Experience building and operating a cloud-based architecture
Experience with robotics data (sensor streams, video, point clouds) and real-time inference systems
Familiarity with model optimization techniques (quantization, pruning, distillation)
Experience with reinforcement learning or simulation-based training pipelines

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

USA, NY, New York - 184,900.00 - 250,200.00 USD annually

Senior ML Engineer, Fauna

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Basic Qualifications

Preferred Qualifications

Basic Qualifications

Preferred Qualifications