Senior Software Engineer, Infrastructure

Google Google · Big Tech · Seattle, WA +1

This role is for a Senior Software Engineer on the Google Compute Engine (GCE) team, focusing on building and optimizing the foundational infrastructure for AI/ML workloads. The engineer will lead the introduction of new VM families, analyze and optimize VM performance, develop software components for reliability and performance, and provide engineering support for existing VM families. The role requires experience with IaaS, large-scale infrastructure, distributed systems, and virtualization, with a focus on supporting the growing demand for AI/ML computing.

What you'd actually do

  1. Lead the introduction and delivery of new x86-based Google Compute Engine (GCE) VM families (NPIs), ensuring on-time launch and adherence to performance goals on Intel and AMD platforms.
  2. Analyze, debug, and optimize VM performance, minimize latency, and enhance customer workload efficiency.
  3. Develop and improve software components, tests, and qualification processes within the GCE node stack to increase reliability (MTBF) and performance consistency of x86 VMs.
  4. Provide engineering support for generally available VM families, addressing customer-reported issues, performance regressions, and production incidents.
  5. Collaborate cross-functionally with teams to integrate new technologies and improve the x86 host stack.

Skills

Required

  • software development
  • Infrastructure as a Service (IaaS)
  • large-scale infrastructure
  • distributed systems
  • networks
  • compute technologies
  • storage
  • hardware architecture
  • reliability engineering
  • quality engineering
  • virtualization

Nice to have

  • data structures
  • algorithms
  • technical leadership
  • Virtual Machines (VMs)
  • Cloud technologies
  • system architecture
  • systems programming

What the JD emphasized

  • large-scale AI/ML
  • foundational infrastructure for the next generation of AI
  • AI and Infrastructure team
  • delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity
  • shaping the future of world-leading hyperscale computing
  • development of our TPUs
  • Vertex AI for Google Cloud
  • Google Global Networking
  • Data Center operations
  • systems research