Principal Software Development Engineer — Platform & Infrastructure

Expedia Expedia · Hospitality · CA

Principal Software Development Engineer focused on designing, implementing, and operating cloud infrastructure and platform capabilities (Kubernetes, containers, CI/CD, IaC, AWS services) to support multiple product teams. The role involves defining technical strategy, leading migration efforts, contributing code and IaC, driving SRE and observability practices, and optimizing cloud costs.

What you'd actually do

  1. Define the technical strategy, roadmap and standards for cloud infrastructure and platform capabilities (container runtime, orchestration, networking, CI/CD, observability, security, IaC)
  2. Lead migration and platform adoption efforts (containerization, Kubernetes/EKS, runtime platform) and drive roadmap execution end-to-end
  3. Be an active code and IaC contributor (Terraform/CloudFormation/Helm, platform services, automation) and perform design/code reviews
  4. Design scalable, resilient, and secure infrastructure patterns for microservices, data stores, caching, and messaging
  5. Build and improve CI/CD pipelines, release automation, testing strategies, and safe deployment practices

Skills

Required

  • 10+ years professional software engineering experience
  • building and operating distributed cloud services
  • building and running platforms on AWS (EKS, ECS, EC2, VPC, IAM, S3, RDS, ELB/ALB, Auto Scaling)
  • containerization and Kubernetes at scale (EKS or comparable)
  • infrastructure as code (Terraform, CloudFormation) and Helm charts
  • contributing production code and platform automation (languages such as Go, Python, Java, or similar)
  • designing for resilience, observability, security and operational automation
  • CI/CD tooling and developer workflows (Spinnaker, Jenkins, GitHub Actions, GitLab CI, or similar)
  • lead cross-team technical initiatives and influence architectural decisions
  • communication skills
  • mentoring engineers

Nice to have

  • platform engineering
  • SRE
  • infrastructure leadership at scale
  • monitoring/observability stacks (Prometheus, Grafana, Datadog, OpenTelemetry, Jaeger)
  • AWS cost optimization strategies and tooling (Cost Explorer, Trusted Advisor, billing APIs)
  • service meshes (Istio, Linkerd, VPC Lattice), API gateways, and advanced networking patterns
  • security/compliance for cloud environments, secrets management, and policy-as-code
  • migrating monoliths to cloud-native architectures

What the JD emphasized

  • hands-on experience building and operating distributed cloud services
  • Deep hands-on experience with containerization and Kubernetes at scale
  • Practical experience with infrastructure as code
  • Proven record of contributing production code and platform automation
  • Demonstrated ability to lead cross-team technical initiatives and influence architectural decisions