Staff Platform Engineer, Ai/ml Infrastructure

Pfizer Pfizer · Pharma · Rives de Paris, Europe, France

Staff Platform Engineer focused on building and operating the cloud infrastructure for enterprise-scale generative AI applications, including LLM integrations and model routing. Responsibilities include defining technical strategy, architecting scalable AWS platforms, leading CI/CD, and improving observability and security practices.

What you'd actually do

  1. Define and drive the technical strategy for AI/ML platform infrastructure supporting generative AI applications, LLM integrations, model routing, and enterprise AI services.
  2. Architect, build, and operate scalable cloud platforms using AWS services such as EKS, ECS Fargate, Lambda, DynamoDB, S3, OpenSearch, Secrets Manager, CloudWatch, ALB, and MWAA.
  3. Establish reusable infrastructure patterns using CloudFormation, Helm, and Terraform to support reliable multi-environment and multi-region deployments.
  4. Lead CI/CD architecture using GitHub Actions, reusable workflows, OIDC-based AWS authentication, automated quality gates, deployment promotion, and environment approvals.
  5. Design and improve observability across AI platforms, including CloudWatch dashboards, logs, alarms, Prometheus/Grafana, OpenSearch, Langfuse, and LLM-specific operational metrics.

Skills

Required

  • 7+ years of experience in DevOps, platform engineering, cloud infrastructure, site reliability engineering, or software engineering roles.
  • Strong hands-on experience with AWS/Azure/GCP infrastructure and services, including container, serverless, networking, storage, observability, and security services.
  • Experience designing and operating production systems on Kubernetes, ECS/Fargate, or comparable container orchestration platforms.
  • Proficiency with infrastructure-as-code, especially CloudFormation, Terraform, Helm, or similar tooling.
  • Strong CI/CD experience with GitHub Actions or similar platforms, including reusable workflows, automated testing, deployment gates, and cloud authentication.
  • Experience building and operating observability solutions using CloudWatch, Prometheus/Grafana, OpenSearch, or similar tools.
  • Strong understanding of cloud security practices, IAM, secrets management, least-privilege access, audit logging, and compliance requirements.
  • Experience supporting distributed systems, microservices, APIs, asynchronous workloads, and multi-environment deployments.
  • Demonstrated ability to lead technical design, mentor engineers, and influence engineering practices across teams.

Nice to have

  • Experience supporting AI/ML or generative AI platforms, including LLM gateways, model routing, prompt observability, token metering, or model failover.
  • Experience operating platforms in regulated enterprise environments, ideally healthcare, pharmaceutical, finance, or life sciences.
  • Experience with multi-account, multi-region AWS architectures and enterprise governance patterns.
  • Experience with cost optimization, autoscaling strategies, capacity planning, and cloud budget monitoring.
  • Experience with load testing and performance validation using tools such as Locust or comparable frameworks.
  • Strong Python or scripting skills for platform automation, operational tooling, and CI/CD extensions.
  • Ability to communicate complex technical decisions clearly to engineering, security, operations, and leadership audiences.

What the JD emphasized

  • enterprise-scale generative AI applications
  • AI/ML platform infrastructure
  • generative AI applications
  • LLM integrations
  • enterprise AI services
  • scalable cloud platforms
  • multi-environment and multi-region deployments
  • AI platforms
  • GenAI workloads
  • deployment reliability
  • security and compliance practices
  • cost optimization
  • capacity planning
  • operational resilience
  • regulated enterprise environments

Other signals

  • Provide technical leadership for cloud platforms, deployment systems, and operational foundations that power enterprise-scale generative AI applications.
  • Define and drive the technical strategy for AI/ML platform infrastructure supporting generative AI applications, LLM integrations, model routing, and enterprise AI services.
  • Architect, build, and operate scalable cloud platforms using AWS services such as EKS, ECS Fargate, Lambda, DynamoDB, S3, OpenSearch, Secrets Manager, CloudWatch, ALB, and MWAA.
  • Design and improve observability across AI platforms, including CloudWatch dashboards, logs, alarms, Prometheus/Grafana, OpenSearch, Langfuse, and LLM-specific operational metrics.