What you'd actually do

Design and evolve production MLOps capabilities across the full ML lifecycle including datasets, features, models, evaluations, deployments, monitoring, retraining, and feedback signals.

Build systems for experiment tracking, artifact management, reproducibility, versioning, lineage, promotion workflows, and production readiness.

Develop reusable platform tooling, golden paths, and engineering standards that improve consistency and delivery velocity across teams.

Build operational infrastructure for LLM and agentic systems including prompts, tools, traces, evaluations, observability, safety boundaries, and production monitoring.

Design evaluation and monitoring frameworks for AI systems including answer quality, latency, grounding, reliability, and operational regressions.

Skills

Required

5+ years of professional software engineering, MLOps, or ML platform engineering experience in production environments.
Significant experience building or owning production ML infrastructure and lifecycle systems.
Strong Python engineering skills with production-grade architecture, modular design, testing, packaging, and robust error handling.
Strong understanding of the end-to-end ML lifecycle including training, deployment, monitoring, retraining, reproducibility, and lineage.
Experience working with large-scale data platforms such as Databricks, Spark, Delta Lake, or equivalent ecosystems.
Experience with ML platform and MLOps frameworks such as MLflow, Metaflow, Kubeflow, or equivalent ML lifecycle-management systems.
Proven ability to design reusable workflow orchestration using Airflow, Metaflow, or Databricks, covering automation, scheduling, dependency management, and production reliability.
Familiarity with operational patterns for LLMOps, AgentOps, and production AI systems.
Strong written and verbal communication skills in English.

Nice to have

Experience with industrial, IoT or manufacturing platforms.
Experience with feature stores, model registries, dataset versioning, and lineage systems.
Experience with AI agents, RAG systems, production GenAI applications, or evaluation frameworks.

What the JD emphasized

production engineering experience building and operating scalable ML and AI systems

software-first MLOps platform role focused on production reliability

ML lifecycle management

large-scale training infrastructure

operational AI systems

reusable platform capabilities

production MLOps capabilities

ML lifecycle

production readiness

reusable platform tooling

operational infrastructure for LLM and agentic systems

production monitoring

evaluation and monitoring frameworks for AI systems

large-scale training pipelines

production-grade Python services

engineering quality through automated testing

CI/CD

observability

deployment standards

operational best practices

production environments

production ML infrastructure and lifecycle systems

production-grade architecture

robust error handling

end-to-end ML lifecycle

large-scale data platforms

ML platform and MLOps frameworks

reusable workflow orchestration

production reliability

LLMOps

AgentOps

production AI systems

production foundation

production-grade AI platforms

scaling ML systems

operational backbone of Industrial AI

Our mission is to transform how people and machines work together to push the boundaries of human productivity. A leader in Industrial AI, Augury helps the world’s manufacturers leverage real-time production insights to drive new levels of efficiency. Combining predictive and prescriptive AI technology with industry expertise, production teams can proactively address alerts, minimize downtime, reduce asset costs, and maximize yield and capacity. Our customers achieve payback in six months or less, enabling global scale. We're looking for team members excited to partner with the world's manufacturers and build the future of production together.

We are looking for a MLOps Engineer with strong production engineering experience building and operating scalable ML and AI systems.

This is a software-first MLOps platform role focused on production reliability, ML lifecycle management, large-scale training infrastructure, operational AI systems, and reusable platform capabilities.

You will help build and scale the production platform behind Augury’s Industrial AI Workforce, enabling teams across the company to develop, evaluate, deploy, and operate ML and AI systems consistently and safely.

A Day In Your Life

Design and evolve production MLOps capabilities across the full ML lifecycle including datasets, features, models, evaluations, deployments, monitoring, retraining, and feedback signals.
Build systems for experiment tracking, artifact management, reproducibility, versioning, lineage, promotion workflows, and production readiness.
Develop reusable platform tooling, golden paths, and engineering standards that improve consistency and delivery velocity across teams.
Build operational infrastructure for LLM and agentic systems including prompts, tools, traces, evaluations, observability, safety boundaries, and production monitoring.
Design evaluation and monitoring frameworks for AI systems including answer quality, latency, grounding, reliability, and operational regressions.
Build and optimize large-scale training pipelines supporting heterogeneous data sources and scalable compute patterns.
Write clean, modular, production-grade Python services and platform libraries.
Drive engineering quality through automated testing, CI/CD, observability, deployment standards, and operational best practices.

What You Bring

5+ years of professional software engineering, MLOps, or ML platform engineering experience in production environments.
Significant experience building or owning production ML infrastructure and lifecycle systems.
Strong Python engineering skills with production-grade architecture, modular design, testing, packaging, and robust error handling.
Strong understanding of the end-to-end ML lifecycle including training, deployment, monitoring, retraining, reproducibility, and lineage.
Experience working with large-scale data platforms such as Databricks, Spark, Delta Lake, or equivalent ecosystems.
Experience with ML platform and MLOps frameworks such as MLflow, Metaflow, Kubeflow, or equivalent ML lifecycle-management systems.
Proven ability to design reusable workflow orchestration using Airflow, Metaflow, or Databricks, covering automation, scheduling, dependency management, and production reliability.
Familiarity with operational patterns for LLMOps, AgentOps, and production AI systems.
Strong written and verbal communication skills in English.

Nice to Have

Experience with industrial, IoT or manufacturing platforms.
Experience with feature stores, model registries, dataset versioning, and lineage systems.
Experience with AI agents, RAG systems, production GenAI applications, or evaluation frameworks.

Why This Role Matters

This role is an opportunity to help build the production foundation behind Augury’s Industrial AI Workforce.

You will help transform ML and AI work from isolated experimentation into scalable, observable, reliable, and reusable production systems powering the next generation of industrial AI.

If you enjoy building production-grade AI platforms, scaling ML systems on modern data infrastructure, and shaping the operational backbone of Industrial AI, we would love to meet you.

Augury is a people-first organization. We believe in fostering an inclusive environment in which employees feel encouraged to share their unique perspectives, leverage their strengths, and act authentically. We know that diverse teams are strong teams, and we welcome those from all backgrounds and varying experiences. We are committed to providing employees with a work environment free of discrimination and harassment. We believe that diversity is more than just good intentions, and we are committed to creating an inclusive environment for all employees.

Augury is a proud equal opportunity employer, we strive to create a work environment in which everyone, all applicants, employees, customers, guests, and vendors feel safe and comfortable. We commit to maintain a workplace that is free of any type of harassment and does not tolerate anyone intimidating, humiliating, or hurting others. We prohibit willful discrimination based on age, gender, ethnicity, race, color, religion, political opinions, sexual orientation, sexual identity or expression, military or veteran status, disability or any other characteristic protected by law.