Senior Machine Learning Software Development Engineer, AI Ops Integration

Amazon Amazon · Big Tech · D, Ireland +1 · Software Development

Senior Machine Learning Engineer role focused on designing and deploying production ML/LLM systems and agentic AI solutions for Amazon Operations & Supply Chain. The role involves end-to-end system design, data pipelines, model serving, orchestration of complex workflows, and ensuring safe and reliable operation of AI systems. It requires leadership in technical design, architecture, and engineering excellence across the ML lifecycle, with a focus on automating operational decisions at scale.

What you'd actually do

  1. Lead the technical design and architecture of production ML/LLM systems end-to-end from data pipelines and model serving to scalable user-facing applications
  2. Architect and build agentic AI solutions that orchestrate complex operational workflows across multiple systems, APIs, and decision points
  3. Define the technical strategy for internal tooling. Designing front-end platforms (dashboards, products) that serve non-technical operations users at worldwide scale
  4. Own the integration architecture across internal systems, databases, and MCP servers: establishing patterns that enable modular multi-system orchestration
  5. Drive engineering excellence across the ML lifecycle: set standards for experimentation, deployment, monitoring, evaluation, and incident response

Skills

Required

  • Lead the technical design and architecture of production ML/LLM systems end-to-end
  • Architect and build agentic AI solutions
  • Define the technical strategy for internal tooling
  • Own the integration architecture across internal systems
  • Drive engineering excellence across the ML lifecycle
  • Design guardrails, evaluation frameworks, and human-in-the-loop architectures
  • Mentor junior engineers
  • conduct design reviews
  • raise the technical bar
  • Partner with scientists, product managers, and operations leaders
  • translate ambiguous business problems into well-scoped technical solutions
  • Bachelor's degree
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience in professional, non-internship software development
  • Experience leading the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems
  • Experience programming with at least one modern language such as Java, C++, or C# including object-oriented design
  • Knowledge of machine learning concepts and their application to reasoning and problem-solving
  • Experience building complex software systems that have been successfully delivered to customers

Nice to have

  • Master's degree in computer science or equivalent
  • Experience with full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations
  • Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution, or experience with vLLM, SGLang, TensorRT or similar platforms in production environments
  • Experience in several of the following areas: machine learning, statistics, deep learning, natural language processing, or information retrieval
  • Experience with feature delivery and tradeoffs of a product
  • Experience with operations/supply chain
  • Experience researching about machine learning, deep learning, NLP, computer vision, data science
  • Experience communicating across technical and non-technical audiences, including executive level stakeholders or clients
  • Experience using data and metrics to measure impact and determine improvements

What the JD emphasized

  • AI-Native organization
  • transform Amazon Operations & Supply Chain into an AI-Native organization
  • deliver AI solutions
  • AI capabilities
  • AI agents
  • agentic AI solutions
  • orchestrate complex operational workflows
  • ML lifecycle
  • production AI systems
  • AI agents
  • automating operational decisions

Other signals

  • AI-Native organization
  • AI solutions
  • predictive analytics
  • LLMs
  • autonomous AI agents
  • automation
  • agentic AI solutions
  • orchestrate complex operational workflows
  • ML lifecycle
  • production AI systems
  • AI agents