Principal Software Development Engineer - Cloud Platform

Expedia · Hospitality · CA

Expedia is seeking a Principal Software Development Engineer to architect their Cloud Platform. This role focuses on evolving the platform to handle increasing code volume and service complexity, ensuring reliability, optimizing cloud economics, and improving developer experience. Key responsibilities include leading architectural evolution towards a Cell-Based Architecture, modernizing Kubernetes and infrastructure, hardening reliability and observability, optimizing cloud economics through FinOps, and supporting developer workflows with agent-friendly infrastructure and standardized Dev Containers. The role requires deep expertise in cloud-native distributed systems, Kubernetes, observability, and Infrastructure as Code, with a preference for experience integrating AI/ML solutions into platform services.

What you'd actually do

Lead Architectural Evolution: You’ll own the move toward a Cell-Based Architecture. We need to move away from fragile, monolithic clusters and toward isolated, predictable failure domains that allow us to scale horizontally with confidence.
Modernize Kubernetes & Infrastructure: You’ll define our K8s strategy, focusing on multi-cluster management, service mesh, and automated scaling. You need to ensure our "Golden Path" makes it easy for engineers to do the right thing by default.
Hardened Reliability & Observability: You will set the standards for SRE across the org. This means moving beyond basic dashboards to causal observability, automated incident response, and rigorous SLO/SLI management. You’ll help us engineer out the root causes of systemic instability.
Optimize Cloud Economics: You’ll lead our FinOps technical strategy. You need to build the tooling and visibility that allows us to understand cost-per-service and ensures our infrastructure spend is directly tied to business value.
Support the Developer Workflow: While we are embracing AI tools, your job is to build the underlying "agent-friendly" infrastructure. This includes standardized Dev Containers and ephemeral environments that allow for fast, isolated iteration without clobbering shared state.

Skills

Required

Extensive professional software development experience designing, building, and operating large-scale, cloud-native distributed systems and platform services on Kubernetes.
Proven ownership of critical services or multi-service platforms, including responsibility for system design (LLD), API design, data modeling, deployment, and ongoing operational health.
Deep expertise with at least one major public cloud provider and core platform technologies (compute, networking, storage, service discovery, security, observability, and CI/CD).
Demonstrated ability to make high-impact architectural decisions, navigate complex trade-offs, and guide multiple teams toward coherent, long-term technical direction.
Familiarity with AI-driven systems, tools, or workflows and applying AI/ML concepts to real world products within cloud or platform environments.
Deep knowledge of observability patterns (OpenTelemetry, Prometheus, distributed tracing).
Expert-level understanding of Infrastructure as Code (Terraform, Pulumi) and CI/CD at scale.
Proficiency in Go, Rust, or similar languages used in modern platform engineering.

Nice to have

Track record of defining and evolving multi-year technical strategies for cloud and developer platform ecosystems, and successfully driving adoption of shared platforms across many teams.
Experience designing and operating highly available, globally distributed systems at internet scale, including capacity planning, performance optimization, and robust failure handling.
Safely integrates and operates AI/ML‑enabled solutions that improve outcomes, such as intelligent routing, predictive scaling, or automated remediation embedded in platform services, with appropriate safeguards.
Advanced experience applying AI/ML techniques to cloud and platform problems (for example, cost optimization, anomaly detection, or performance tuning) and partnering with data/ML teams to productionize these capabilities.
A Systems Architect: You understand the deep plumbing of the cloud (AWS/GCP, K8s, networking). You think in terms of failure domains, latencies, and unit economics.
Reliability-First: You’ve carried a pager for global-scale systems. You have a healthy "paranoia" about state, consistency, and cascading failures.
Hands-on: You still love to build. You can prototype a complex infrastructure change in a weekend to prov

What the JD emphasized

primary architect of our technical future
explosion in code volume and service complexity
handle this growth without sacrificing reliability or skyrocketing our cloud bill
agentic developer tools
efficient Kubernetes footprint
observability stack provides signals, not just noise
Cell-Based Architecture
multi-cluster management
automated scaling
causal observability
automated incident response
rigorous SLO/SLI management
FinOps technical strategy
understanding cost-per-service
infrastructure spend is directly tied to business value
agent-friendly infrastructure
standardized Dev Containers
ephemeral environments
applying AI/ML concepts to real world products within cloud or platform environments
Safely integrates and operates AI/ML‑enabled solutions that improve outcomes, such as intelligent routing, predictive scaling, or automated remediation embedded in platform services, with appropriate safeguards.
Advanced experience applying AI/ML techniques to cloud and platform problems (for example, cost optimization, anomaly detection, or performance tuning) and partnering with data/ML teams to productionize these capabilities.

Read full job description

Expedia Group brands power global travel for everyone, everywhere. We design cutting-edge tech to make travel smoother and more memorable, and we create groundbreaking solutions for our partners. Our diverse, vibrant, and welcoming community is essential in driving our success.

Why Join Us?

To shape the future of travel, people must come first. Guided by our Values and Leadership Agreements, we foster an open culture where everyone belongs, differences are celebrated and know that when one of us wins, we all win.

We provide a full benefits package, including exciting travel perks, generous time-off, parental leave, a flexible work model (with some pretty cool offices), and career development resources, all to fuel our employees' passion for travel and ensure a rewarding career journey. We’re building a more open world. Join us.

**Principal Software Development Engineer **

Our Technology Team partners with teams across Expedia Group to create innovative products, services, and tools to deliver high-quality experiences for travelers, partners, and our employees. A singular technology platform powered by cloud and data provides secure, differentiated, and personalized experiences that drive loyalty and traveler satisfaction.

We are looking for a Principal Engineer to serve as the technical architect for our Cloud Platform organization which sits within our Technology division. As a Principal Engineer reporting to the VP of Cloud Platform, you will be the primary architect of our technical future. The Cloud Platform organization provides the secure, scalable cloud infrastructure, runtime platforms, and developer experience tooling that enable teams across Expedia Group to build, deploy, and operate high-quality, resilient software quickly and safely.

We are seeing an explosion in code volume and service complexity. The goal for this role is to build a platform that can handle this growth without sacrificing reliability or skyrocketing our cloud bill. You’ll be responsible for making sure our architecture is composable, our developer tools are agentic, our Kubernetes footprint is efficient, and our observability stack provides signals, not just noise.

In this role, you will:

Lead Architectural Evolution: You’ll own the move toward a Cell-Based Architecture. We need to move away from fragile, monolithic clusters and toward isolated, predictable failure domains that allow us to scale horizontally with confidence.
Modernize Kubernetes & Infrastructure: You’ll define our K8s strategy, focusing on multi-cluster management, service mesh, and automated scaling. You need to ensure our "Golden Path" makes it easy for engineers to do the right thing by default.
Hardened Reliability & Observability: You will set the standards for SRE across the org. This means moving beyond basic dashboards to causal observability, automated incident response, and rigorous SLO/SLI management. You’ll help us engineer out the root causes of systemic instability.
Optimize Cloud Economics: You’ll lead our FinOps technical strategy. You need to build the tooling and visibility that allows us to understand cost-per-service and ensures our infrastructure spend is directly tied to business value.
Support the Developer Workflow: While we are embracing AI tools, your job is to build the underlying "agent-friendly" infrastructure. This includes standardized Dev Containers and ephemeral environments that allow for fast, isolated iteration without clobbering shared state.

Minimum Qualifications:

Extensive professional software development experience designing, building, and operating large-scale, cloud-native distributed systems and platform services on Kubernetes.
Proven ownership of critical services or multi-service platforms, including responsibility for system design (LLD), API design, data modeling, deployment, and ongoing operational health.
Deep expertise with at least one major public cloud provider and core platform technologies (compute, networking, storage, service discovery, security, observability, and CI/CD).
Demonstrated ability to make high-impact architectural decisions, navigate complex trade-offs, and guide multiple teams toward coherent, long-term technical direction.
Familiarity with AI-driven systems, tools, or workflows and applying AI/ML concepts to real world products within cloud or platform environments.
Deep knowledge of observability patterns (OpenTelemetry, Prometheus, distributed tracing).
Expert-level understanding of Infrastructure as Code (Terraform, Pulumi) and CI/CD at scale.
Proficiency in Go, Rust, or similar languages used in modern platform engineering.

Preferred Qualifications:

Track record of defining and evolving multi-year technical strategies for cloud and developer platform ecosystems, and successfully driving adoption of shared platforms across many teams.
Experience designing and operating highly available, globally distributed systems at internet scale, including capacity planning, performance optimization, and robust failure handling.
Safely integrates and operates AI/ML‑enabled solutions that improve outcomes, such as intelligent routing, predictive scaling, or automated remediation embedded in platform services, with appropriate safeguards.
Advanced experience applying AI/ML techniques to cloud and platform problems (for example, cost optimization, anomaly detection, or performance tuning) and partnering with data/ML teams to productionize these capabilities.
A Systems Architect: You understand the deep plumbing of the cloud (AWS/GCP, K8s, networking). You think in terms of failure domains, latencies, and unit economics.
Reliability-First: You’ve carried a pager for global-scale systems. You have a healthy "paranoia" about state, consistency, and cascading failures.
Hands-on: You still love to build. You can prototype a complex infrastructure change in a weekend to prove it works, but you have the discipline to ensure it's production-grade before it ships.

The total cash range for this position in San Jose is $249,000.00 to $348,500.00. Employees in this role have the potential to increase their pay up to $398,500.00, which is the top of the range, based on ongoing, demonstrated, and sustained performance in the role.

Starting pay for this role will vary based on multiple factors, including location, available budget, and an individual’s knowledge, skills, and experience. Pay ranges may be modified in the future.

Expedia Group is proud to offer a wide range of benefits to support employees and their families, including medical/dental/vision, paid time off, and an Employee Assistance Program. To fuel each employee’s passion for travel, we offer a wellness & travel reimbursement, travel discounts, and an International Airlines Travel Agent (IATAN) membership. View our full list of benefits.

Accommodation requests

If you need assistance with any part of the application or recruiting process due to a disability, or other physical or mental health conditions, please reach out to our Recruiting Accommodations Team through the Accommodation Request.

We are proud to be named as a Best Place to Work on Glassdoor in 2024 and be recognized for award-winning culture by organizations like Forbes, TIME, Disability:IN, and others.

Expedia Group's family of brands includes: Brand Expedia®, Hotels.com®, Expedia® Partner Solutions, Vrbo®, trivago®, Orbitz®, Travelocity®, Hotwire®, Wotif®, ebookers®, CheapTickets®, Expedia Group™ Media Solutions, Expedia Local Expert®, CarRentals.com™, and Expedia Cruises™. © 2024 Expedia, Inc. All rights reserved. Trademarks and logos are the property of their respective owners. CST: 2029030-50

Employment opportunities and job offers at Expedia Group will always come from Expedia Group’s Talent Acquisition and hiring teams. Never provide sensitive, personal information to someone unless you’re confident who the recipient is. Expedia Group does not extend job offers via email or any other messaging tools to individuals with whom we have not made prior contact. Our email domain is @expediagroup.com. The official website to find and apply for job openings at Expedia Group is careers.expediagroup.com/jobs.

Expedia is committed to creating an inclusive work environment with a diverse workforce. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, veteran status, or any other characteristic protected by law. This employer participates in E-Verify. The employer will provide the Social Security Administration (SSA) and, if necessary, the Department of Homeland Security (DHS) with information from each new employee's I-9 to confirm work authorization.