Senior Infrastructure & Enterprise Engineer

Expedia Expedia · Hospitality · Seattle, WA

This role focuses on designing, implementing, and operating resilient enterprise infrastructure services and platforms, including on-premise and cloud environments. It involves automating infrastructure provisioning, configuration, monitoring, and incident response. The role also requires collaboration with security, networking, and platform teams, and leading troubleshooting for complex infrastructure issues. A key aspect is the safe integration and operation of AI/ML-enabled solutions to improve outcomes, with familiarity in AI-driven systems and applying AI/ML concepts to production environments.

What you'd actually do

  1. Design, implement, and operate resilient enterprise infrastructure services and platforms, ensuring reliability, scalability, and security across on‑premise and cloud environments.
  2. Develop and maintain low‑level system designs (LLD), APIs, and data models that enable robust integration between infrastructure components, internal services, and enterprise systems.
  3. Automate infrastructure provisioning, configuration, monitoring, and incident response using modern scripting, IaC, and CI/CD practices to improve efficiency and reduce operational risk.
  4. Collaborate with security, networking, platform, and application engineering teams to define standards, patterns, and guardrails that drive consistency and operational excellence across multiple domains.
  5. Lead troubleshooting and root‑cause analysis for complex infrastructure issues, implementing durable fixes and driving continuous improvements in observability, capacity management, and performance.
  6. Safely integrate and operate AI/ML‑enabled solutions that improve outcomes, including familiarity with AI-driven systems, tools, virtualization, Kubernetes (K8s), or workflows and applying AI/ML concepts to real world products, while ensuring they are secure, compliant, and reliable in production.

Skills

Required

  • infrastructure engineering
  • systems engineering
  • enterprise engineer
  • production services
  • platforms at scale
  • virtualization
  • Kubernetes (K8s)
  • compute
  • storage
  • networking
  • identity
  • security
  • designing APIs
  • data models
  • infrastructure integrations
  • scripting
  • programming
  • automation tooling
  • infrastructure as code
  • CI/CD
  • provisioning
  • configuration
  • managing enterprise infrastructure
  • low-level design (LLD)
  • system architecture
  • operational runbooks
  • monitoring
  • alerting
  • incident response
  • AI-driven systems
  • AI/ML concepts

Nice to have

  • designing and evolving large-scale, highly available enterprise infrastructure platforms
  • shared services
  • technical roadmaps
  • standards across multiple domains
  • complex infrastructure migrations
  • modernization efforts
  • major incident remediation
  • data-driven decision making
  • optimize reliability
  • performance
  • cost
  • API design
  • platform services
  • secure, self-service, and automated consumption
  • advanced observability practices
  • capacity planning
  • change management
  • rigorous post-incident analysis
  • AIOps
  • intelligent alerting
  • anomaly detection
  • automated remediation

What the JD emphasized

  • operating production services and platforms at scale
  • critical services
  • core infrastructure domains
  • infrastructure and enterprise integrations
  • automation tooling
  • infrastructure services
  • AI-driven systems
  • AI/ML concepts
  • AI-assisted infrastructure operations