Senior Manager, ML Ops & Observability … at Pfizer

Use Your Power for Purpose

Do you want to make a global impact on patient health? Do you thrive in a fast-paced environment that integrates scientific, clinical, and commercial domains through engineering, data science, and AI. Join Pfizer Digital’s Commercial Creation Center & CDI organization (C4) to leverage cutting-edge technology for critical business decisions and enhance customer experiences for colleagues, patients, and physicians. Our team of engineering, data science, and AI professionals is at the forefront of Pfizer’s transformation into a digitally driven organization, using data science and AI to change patients’ lives, leading process and engineering innovations to advance AI and data science applications from prototypes and MVPs to full production.

What You Will Achieve

MLOps Platform Execution & Model Operations

Lead the design, implementation, and operation of MLOps platforms supporting model development, deployment, monitoring, and lifecycle management.
Own production workflows for:
- Model packaging and deployment
- Versioning and rollback
- Promotion across environments (dev/test/prod)
Implement standardized CI/CD pipelines for ML workloads, integrating with enterprise DevOps and infrastructure platforms.
Partner with infrastructure and DataOps teams to ensure ML workloads run on secure, scalable, and cost-effective cloud-native environments (AWS/Azure).
Translate Director-level AI platform strategy into reliable, repeatable ML operational capabilities.

Model, Data & System Observability

Own end-to-end observability for ML systems, spanning:
- Model performance and behavior
- Data quality and drift
- Pipeline health and system reliability
Implement and operate observability tooling using:
- OpenTelemetry for distributed tracing
- Metrics and dashboards (Prometheus, Grafana)
- Logs and analytics (ELK or equivalent)
Define and track ML-specific reliability signals, such as:
- Model performance degradation
- Data drift and feature anomalies
- Prediction latency and failure rates
Establish SLOs and alerting strategies for ML services in production.

Testing, Validation & Responsible AI Enablement

Ensure testing and validation are embedded throughout the ML lifecycle, including:
- Model validation and regression testing
- Data and feature consistency checks
- Deployment verification and rollback testing
Integrate automated ML testing and quality gates into CI/CD pipelines.
Support non-functional testing for ML systems, including:
- Performance and scalability testing
- Reliability and resilience testing
- Security and access validation
Partner with AI, data, and compliance teams to support responsible and compliant AI operations, including auditability, traceability, and explainability hooks (where required).

AI Platform Enablement & Cross‑Team Collaboration

Enable data scientists and ML engineers to move models from experimentation to production efficiently and safely.
Provide reusable tooling, templates, and paved paths for:
- Experiment tracking
- Model registry usage
- Deployment and monitoring patterns
Collaborate closely with:
- Infrastructure Engineering (runtime, scaling, security)
- DataOps Engineering (data pipelines, feature stores, data quality)
- Product and analytics leaders to align ML capabilities to business outcomes

Reliability, Incident Management & Continuous Improvement

Own operational reliability for ML platforms and services.
Lead response to ML-related production incidents, including:
- Model failures or degradations
- Data drift–driven issues
- Pipeline or inference outages
Conduct post-incident reviews and drive systemic improvements.
Continuously improve MLOps maturity using SRE-inspired practices and metrics.

People Leadership & Engineering Ways of Working

Set clear expectations for operational ownership, quality, and delivery.
Coach engineers on:
- MLOps best practices
- Observability and reliability mindset
- Secure and compliant AI operations
Establish strong engineering discipline through design reviews, runbooks, documentation, and continuous learning.
Act as the primary execution partner to the Director-level Commercial AI Analytics Solutions & Engineering Lead for ML operations and observability.

Here Is What You Need (Minimum Requirements)

BA/BS with 6+ years of experience in ML engineering, MLOps, platform engineering, or related roles.
Strong hands-on experience operationalizing ML systems in AWS or Azure environments.
Proven expertise in:
- MLOps pipelines and tooling (experiment tracking, model registry, deployment, monitoring)
- CI/CD for ML workloads (e.g., GitHub Actions or equivalent)
- Containerized and cloud-native ML runtimes
Solid understanding of testing and validation for ML systems, including:
- Model regression and performance testing
- Data and feature validation
- Deployment and rollback verification
Strong experience implementing observability and reliability practices using tools such as OpenTelemetry, Prometheus, Grafana, and ELK.
Demonstrated experience with DevSecOps and secure SDLC for AI/ML systems, including secrets management and access controls.
Proficiency in programming and scripting (e.g., Python, Bash, SQL; familiarity with ML frameworks).
Strong communication and collaboration skills; ability to deliver outcomes through teams and influence cross-functionally.
Proven leadership abilities.

Bonus Points If You Have (Preferred Requirements)

Master's degree in Computer Science, Data Science, AI/ML, or related field.
Experience with MLOps platforms and tools (e.g., MLflow, Kubeflow, feature stores).
Background in data drift detection, model monitoring, and ML reliability engineering.
Familiarity with responsible AI, governance, or regulated environments.
Relevant certifications:
- AWS/Azure Professional
- Kubernetes (CKA/CKAD)
Cloud security or data/AI platform certifications.
Experience using common AI tools, including generative technologies such as ChatGPT or Microsoft Copilot, to support problem solving and enhance productivity. Demonstrated curiosity for exploring how these tools can improve outcomes and understanding of responsible AI practices, including risk management and ethical use.

Please apply by sending your CV in English.

Work Location Assignment: Hybrid

Purpose

Breakthroughs that change patients' lives... At Pfizer we are a patient centric company, guided by our four values: courage, joy, equity and excellence. Our breakthrough culture lends itself to our dedication to transforming millions of lives.

Digital Transformation Strategy

One bold way we are achieving our purpose is through our company wide digital transformation strategy. We are leading the way in adopting new data, modelling and automated solutions to further digitize and accelerate drug discovery and development with the aim of enhancing health outcomes and the patient experience.

Flexibility

We aim to create a trusting, flexible workplace culture which encourages employees to achieve work life harmony, attracts talent and enables everyone to be their best working self. Let’s start the conversation!

Equal Employment Opportunity

We believe that a diverse and inclusive workforce is crucial to building a successful business. As an employer, Pfizer is committed to celebrating this, in all its forms – allowing for us to be as diverse as the patients and communities we serve. Together, we continue to build a culture that encourages, supports and empowers our employees.

Disability Inclusion

Our mission is unleashing the power of all our people and we are proud to be a disability inclusive employer, ensuring equal employment opportunities for all candidates. We encourage you to put your best self forward with the knowledge and trust that we will make any reasonable adjustments to support your application and future career. Your journey with Pfizer starts here!

Information & Business Tech

Use Your Power for Purpose

What You Will Achieve

MLOps Platform Execution & Model Operations

Lead the design, implementation, and operation of MLOps platforms supporting model development, deployment, monitoring, and lifecycle management.
Own production workflows for:
- Model packaging and deployment
- Versioning and rollback
- Promotion across environments (dev/test/prod)
Implement standardized CI/CD pipelines for ML workloads, integrating with enterprise DevOps and infrastructure platforms.
Partner with infrastructure and DataOps teams to ensure ML workloads run on secure, scalable, and cost-effective cloud-native environments (AWS/Azure).
Translate Director-level AI platform strategy into reliable, repeatable ML operational capabilities.

Model, Data & System Observability

Own end-to-end observability for ML systems, spanning:
- Model performance and behavior
- Data quality and drift
- Pipeline health and system reliability
Implement and operate observability tooling using:
- OpenTelemetry for distributed tracing
- Metrics and dashboards (Prometheus, Grafana)
- Logs and analytics (ELK or equivalent)
Define and track ML-specific reliability signals, such as:
- Model performance degradation
- Data drift and feature anomalies
- Prediction latency and failure rates
Establish SLOs and alerting strategies for ML services in production.

Testing, Validation & Responsible AI Enablement

Ensure testing and validation are embedded throughout the ML lifecycle, including:
- Model validation and regression testing
- Data and feature consistency checks
- Deployment verification and rollback testing
Integrate automated ML testing and quality gates into CI/CD pipelines.
Support non-functional testing for ML systems, including:
- Performance and scalability testing
- Reliability and resilience testing
- Security and access validation
Partner with AI, data, and compliance teams to support responsible and compliant AI operations, including auditability, traceability, and explainability hooks (where required).

AI Platform Enablement & Cross‑Team Collaboration

Enable data scientists and ML engineers to move models from experimentation to production efficiently and safely.
Provide reusable tooling, templates, and paved paths for:
- Experiment tracking
- Model registry usage
- Deployment and monitoring patterns
Collaborate closely with:
- Infrastructure Engineering (runtime, scaling, security)
- DataOps Engineering (data pipelines, feature stores, data quality)
- Product and analytics leaders to align ML capabilities to business outcomes

Reliability, Incident Management & Continuous Improvement

Own operational reliability for ML platforms and services.
Lead response to ML-related production incidents, including:
- Model failures or degradations
- Data drift–driven issues
- Pipeline or inference outages
Conduct post-incident reviews and drive systemic improvements.
Continuously improve MLOps maturity using SRE-inspired practices and metrics.

People Leadership & Engineering Ways of Working

Set clear expectations for operational ownership, quality, and delivery.
Coach engineers on:
- MLOps best practices
- Observability and reliability mindset
- Secure and compliant AI operations
Establish strong engineering discipline through design reviews, runbooks, documentation, and continuous learning.
Act as the primary execution partner to the Director-level Commercial AI Analytics Solutions & Engineering Lead for ML operations and observability.

Here Is What You Need (Minimum Requirements)

BA/BS with 6+ years of experience in ML engineering, MLOps, platform engineering, or related roles.
Strong hands-on experience operationalizing ML systems in AWS or Azure environments.
Proven expertise in:
- MLOps pipelines and tooling (experiment tracking, model registry, deployment, monitoring)
- CI/CD for ML workloads (e.g., GitHub Actions or equivalent)
- Containerized and cloud-native ML runtimes
Solid understanding of testing and validation for ML systems, including:
- Model regression and performance testing
- Data and feature validation
- Deployment and rollback verification
Strong experience implementing observability and reliability practices using tools such as OpenTelemetry, Prometheus, Grafana, and ELK.
Demonstrated experience with DevSecOps and secure SDLC for AI/ML systems, including secrets management and access controls.
Proficiency in programming and scripting (e.g., Python, Bash, SQL; familiarity with ML frameworks).
Strong communication and collaboration skills; ability to deliver outcomes through teams and influence cross-functionally.
Proven leadership abilities.

Bonus Points If You Have (Preferred Requirements)

Master's degree in Computer Science, Data Science, AI/ML, or related field.
Experience with MLOps platforms and tools (e.g., MLflow, Kubeflow, feature stores).
Background in data drift detection, model monitoring, and ML reliability engineering.
Familiarity with responsible AI, governance, or regulated environments.
Relevant certifications:
- AWS/Azure Professional
- Kubernetes (CKA/CKAD)
Cloud security or data/AI platform certifications.
Experience using common AI tools, including generative technologies such as ChatGPT or Microsoft Copilot, to support problem solving and enhance productivity. Demonstrated curiosity for exploring how these tools can improve outcomes and understanding of responsible AI practices, including risk management and ethical use.

Please apply by sending your CV in English.

Work Location Assignment: Hybrid

Purpose

Digital Transformation Strategy

Flexibility

Equal Employment Opportunity

Disability Inclusion

Information & Business Tech

Senior Manager, ML Ops & Observability Engineer

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Use Your Power for Purpose

What You Will Achieve

MLOps Platform Execution & Model Operations

Model, Data & System Observability

Testing, Validation & Responsible AI Enablement

AI Platform Enablement & Cross‑Team Collaboration

Reliability, Incident Management & Continuous Improvement

People Leadership & Engineering Ways of Working

Here Is What You Need (Minimum Requirements)

Bonus Points If You Have (Preferred Requirements)

Use Your Power for Purpose

What You Will Achieve

MLOps Platform Execution & Model Operations

Model, Data & System Observability

Testing, Validation & Responsible AI Enablement

AI Platform Enablement & Cross‑Team Collaboration

Reliability, Incident Management & Continuous Improvement

People Leadership & Engineering Ways of Working

Here Is What You Need (Minimum Requirements)

Bonus Points If You Have (Preferred Requirements)

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Use Your Power for Purpose

What You Will Achieve

MLOps Platform Execution & Model Operations

Model, Data & System Observability

Testing, Validation & Responsible AI Enablement

AI Platform Enablement & Cross‑Team Collaboration

Reliability, Incident Management & Continuous Improvement

People Leadership & Engineering Ways of Working

**Here Is What You Need **(Minimum Requirements)

**Bonus Points If You Have **(Preferred Requirements)

Use Your Power for Purpose

What You Will Achieve

MLOps Platform Execution & Model Operations

Model, Data & System Observability

Testing, Validation & Responsible AI Enablement

AI Platform Enablement & Cross‑Team Collaboration

Reliability, Incident Management & Continuous Improvement

People Leadership & Engineering Ways of Working

**Here Is What You Need **(Minimum Requirements)

**Bonus Points If You Have **(Preferred Requirements)

Here Is What You Need (Minimum Requirements)

Bonus Points If You Have (Preferred Requirements)

Here Is What You Need (Minimum Requirements)

Bonus Points If You Have (Preferred Requirements)