Machine Learning Software Development Engineer, AI Ops Integration

Amazon Amazon · Big Tech · D, Ireland +1 · Software Development

Machine Learning Software Development Engineer focused on integrating AI Operations within Amazon's Supply Chain. The role involves designing and deploying production systems that combine traditional ML with agentic architectures to automate complex operational workflows using LLMs and AI agents. Responsibilities include building full-stack ML/LLM features, implementing AI agent components, developing internal tools, integrating systems, contributing to the ML lifecycle, and implementing guardrails and evaluation frameworks.

What you'd actually do

  1. Build and deploy ML/LLM-powered features across the full stack - from data pipelines and model serving to user-facing internal tools
  2. Implement AI agent components that automate complex operational workflows across multiple systems and decision points
  3. Develop internal front-end applications (dashboards, tools, products) that make AI outputs accessible to non-technical operations users at scale
  4. Build integrations across internal APIs, databases, and MCP servers to enable multi-system orchestration
  5. Contribute to the ML lifecycle: data pipelines, experimentation, deployment, monitoring, and evaluation

Skills

Required

  • Machine learning
  • data mining
  • information retrieval
  • statistics
  • natural language processing
  • professional software development
  • designing or architecting systems
  • software programming language

Nice to have

  • full software development life cycle
  • working with or evaluating AI systems
  • Machine Learning and Large Language Model fundamentals
  • architecture
  • training/inference lifecycles
  • optimization of model execution
  • vLLM
  • SGLang
  • TensorRT
  • production environments
  • deep learning
  • feature delivery and tradeoffs of a product
  • operations/supply chain
  • AWS services
  • IT platform implementation
  • researching about machine learning
  • deep learning
  • NLP
  • computer vision
  • data science
  • communicating across technical and non-technical audiences
  • using data and metrics to measure impact and determine improvements

What the JD emphasized

  • production ready systems
  • highly ambiguous problems
  • AI-Native organization
  • automate decision-making
  • MLOps
  • production AI systems

Other signals

  • AI-Native organization
  • LLMs and autonomous AI agents
  • automate decision-making
  • production ready systems
  • MLOps