Software Engineering Manager, Emergent AI Infrastructure Team

Google Google · Big Tech · Hyderabad, Telangana, India

Software Engineering Manager for an Emergent AI infrastructure team focused on building next-generation on-premises AI infrastructure, integrating hardware to software design, workload management, and large-scale AI clusters. The role involves technical leadership, people management, and overseeing the deployment of large-scale projects.

What you'd actually do

  1. Lead and support software engineers in AI infrastructure for Testbed Turn up and Operations including bare metal k8s infrastructure, identify and access control, observability, and capacity management.
  2. Design, guide and vet systems designs within the scope of the broader area, and write product or system development code to solve ambiguous problems.
  3. Set and communicate team priorities that support the broader organization's goals. Align strategy, processes, and decision-making across teams.
  4. Set clear expectations with individuals based on their level and role and aligned to the broader organization's goals. Meet regularly with individuals to discuss performance and development and provide feedback and coaching.
  5. Review code developed by other engineers and provide feedback to ensure best practices. Oversee end-to-end operations, from coordinating network connectivity to bootstrapping control machines and deploying core Kubernetes infrastructure.

Skills

Required

  • software development
  • technical leadership
  • people management
  • team leadership
  • large-scale infrastructure
  • distributed systems
  • networks
  • compute technologies
  • storage
  • hardware architecture

Nice to have

  • Master's degree or PhD in Computer Science or a related technical field
  • complex, matrixed organization
  • Google Cloud Platform (GCP)
  • deploying and maintaining hardware systems: servers, racks and networks
  • integrations projects

What the JD emphasized

  • technical leadership
  • manage engineers
  • large-scale projects
  • AI infrastructure
  • on-premises Artificial Intelligence (AI) infrastructure
  • hardware to software design
  • workload management
  • large-scale AI clusters
  • AI acceleration
  • cluster interconnects and networking
  • Kubernetes clusters at scale

Other signals

  • AI infrastructure
  • on-premises AI infrastructure
  • large-scale AI clusters
  • AI acceleration
  • cluster interconnects and networking
  • hardware to software design
  • workload management
  • Kubernetes infrastructure