Principal Software Development Engineer (kubernetes)

Expedia Expedia · Hospitality · CA

Principal Software Development Engineer to lead the architecture, design, and building of a compute runtime platform based on Kubernetes on AWS, focusing on multi-tenancy, scaling, developer tooling (CI/CD, GitOps, internal developer portal), and infrastructure automation (IaC). The role involves technical leadership, mentorship, and production debugging for complex platform incidents.

What you'd actually do

  1. Design and Implement Core Platform Components: Evolve our Kubernetes-based environment, focusing on areas like multi-tenancy, network policy, resource management, and service mesh integration (e.g., Istio, Linkerd).
  2. Architect for Scale and Reliability: Lead the technical design for scaling our control plane and data plane to handle a 10x increase in services and traffic. Define and implement SLOs for the platform itself.
  3. Improve the Developer Control Plane: Design and build the next generation of our CI/CD pipelines and GitOps workflows. Drive the strategy for our internal developer portal (e.g., Backstage) to unify tooling, documentation, and service lifecycle management.
  4. Automate Infrastructure Lifecycle: Author and maintain production-grade Infrastructure as Code (IaC) using Terraform and/or Crossplane. Eliminate manual toil by automating cluster provisioning, node lifecycle, and dependency upgrades.
  5. Technical Leadership and Mentorship: Act as a force multiplier. Mentor senior engineers on the team, lead architecture review sessions, and author RFCs to build consensus on significant technical decisions. Your influence will extend beyond the team to application developers and SREs.

Skills

Required

  • Kubernetes
  • AWS
  • Docker
  • Terraform
  • Go
  • Java
  • Python
  • Ruby
  • Infrastructure automation
  • Configuration management
  • Container orchestration
  • Linux
  • Cloud computing

Nice to have

  • Stateless and Stateful workloads
  • Service Mesh
  • Service Discovery
  • Monitoring
  • Alerting
  • Logging
  • Security development principles
  • Token management
  • Encryption
  • Certificates
  • Jenkins
  • Self-service technology platform capabilities
  • Container compute
  • Traffic management
  • API management
  • Mentoring engineers
  • Operational excellence
  • Code quality
  • Istio
  • Linkerd
  • Crossplane
  • Backstage
  • Argo
  • Helm
  • Consul
  • Vagrant
  • Vault
  • Nomad

What the JD emphasized

  • 8+ years of experience in infrastructure automation, configuration management or container orchestration
  • Experience in cloud computing with Amazon Web Services (AWS) and containerization with Docker and Kubernetes/EKS
  • Strong programming skills in one or more languages: Java, Go, Python or Ruby