Senior Manager, ML Ops & Observability Engineer

Pfizer Pfizer · Pharma · Thessaloniki Chortiatis, Greece

Senior Manager role focused on building and operating MLOps platforms and ensuring end-to-end observability for ML systems within a healthcare company. The role involves leading the design and implementation of platforms for model deployment, monitoring, and lifecycle management, integrating with cloud-native environments and DevOps practices. Key responsibilities include establishing observability tooling, defining reliability signals, implementing testing and validation, and enabling data scientists to move models to production safely. The role also emphasizes people leadership and continuous improvement using SRE principles.

What you'd actually do

  1. Lead the design, implementation, and operation of MLOps platforms supporting model development, deployment, monitoring, and lifecycle management.
  2. Own production workflows for: Model packaging and deployment, Versioning and rollback, Promotion across environments (dev/test/prod)
  3. Implement standardized CI/CD pipelines for ML workloads, integrating with enterprise DevOps and infrastructure platforms.
  4. Own end-to-end observability for ML systems, spanning: Model performance and behavior, Data quality and drift, Pipeline health and system reliability
  5. Define and track ML-specific reliability signals, such as: Model performance degradation, Data drift and feature anomalies, Prediction latency and failure rates

Skills

Required

  • ML engineering
  • MLOps
  • platform engineering
  • operationalizing ML systems in AWS or Azure
  • MLOps pipelines and tooling
  • CI/CD for ML workloads
  • Containerized and cloud-native ML runtimes
  • Testing and validation for ML systems
  • Observability and reliability practices
  • OpenTelemetry
  • Prometheus
  • Grafana
  • ELK
  • DevSecOps
  • secure SDLC for AI/ML systems
  • Python
  • Bash
  • SQL
  • ML frameworks
  • Communication and collaboration skills
  • Leadership abilities

Nice to have

  • Master's degree in Computer Science, Data Science, AI/ML
  • MLflow
  • Kubeflow
  • feature stores
  • data drift detection
  • model monitoring
  • ML reliability engineering
  • responsible AI
  • governance
  • regulated environments

What the JD emphasized

  • MLOps platforms
  • model deployment
  • monitoring
  • observability
  • CI/CD for ML
  • testing and validation
  • reliability
  • secure and compliant AI operations

Other signals

  • MLOps platforms
  • model deployment
  • monitoring
  • observability
  • CI/CD for ML
  • SRE practices