What you'd actually do

Set and communicate team priorities that support the broader organization's goals. Align strategy, processes, and decision-making across teams.

Set clear expectations with individuals based on their level and role and aligned to the broader organization's goals. Meet regularly with individuals to discuss performance and development and provide feedback and coaching.

Develop the mid-term technical goal and roadmap within the scope of your (often multiple) team(s). Evolve the roadmap to meet anticipated future requirements and infrastructure needs.

Design, guide and vet systems designs within the scope of the broader area, and write product or system development code to solve ambiguous problems.

Review code developed by other engineers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).

Skills

Required

software development
creating product roadmaps
working with cross-functional teams
reliability engineering
technical leadership
people management
team leadership

Nice to have

designing, building, or operating highly available, fault-tolerant distributed systems
chaos engineering
fault-injection testing frameworks
large-scale disaster recovery simulations
designing developer-facing products, APIs, SDKs, or self-service automation tools
reducing friction
improving developer velocity
Google’s server frameworks (Pod)
gRPC/Stubby-based RPC layers
container orchestrators
defining and driving organizational key metrics (SLIs/SLOs, adoption rates, platform health)

Like Google's own ambitions, the work of a Software Engineer goes beyond just Search. Software Engineering Managers have not only the technical expertise to take on and provide technical leadership to major projects, but also manage a team of Engineers. You not only optimize your own code but make sure Engineers are able to optimize theirs. As a Software Engineering Manager you manage your project goals, contribute to product strategy and help develop your team. Teams work all across the company, in areas such as information retrieval, artificial intelligence, natural language processing, distributed computing, large-scale system design, networking, security, data compression, user interface design; the list goes on and is growing every day. Operating with scale and speed, our exceptional software engineers are just getting started -- and as a manager, you guide the way.

With technical and leadership expertise, you manage engineers across multiple teams and locations, a large product budget and oversee the deployment of large-scale projects across multiple sites internationally.

In this role, you will drive the technical goal to transition Fault Tolerance Testing (FTT) from a set of manual compliance verification tools to a proactive, AI-driven, and autonomous resilience platform. This position sits at the intersection of large-scale distributed systems, developer velocity, and cloud reliability, offering immense visibility and the opportunity to directly safeguard Google Cloud Platform's (GCP) global infrastructure.

Individual pay is determined by factors including job-related skills, experience, and relevant education or training.

US: $207000 - $301000 (USD) + 20% bonus target + equity + benefits

Learn more about benefits at Google.

Responsibilities

Set and communicate team priorities that support the broader organization's goals. Align strategy, processes, and decision-making across teams.
Set clear expectations with individuals based on their level and role and aligned to the broader organization's goals. Meet regularly with individuals to discuss performance and development and provide feedback and coaching.
Develop the mid-term technical goal and roadmap within the scope of your (often multiple) team(s). Evolve the roadmap to meet anticipated future requirements and infrastructure needs.
Design, guide and vet systems designs within the scope of the broader area, and write product or system development code to solve ambiguous problems.
Review code developed by other engineers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).

Qualifications

Minimum qualifications:

Bachelor’s degree, or equivalent practical experience.
8 years of experience in software development.
5 years of experience creating product roadmaps, and working with cross-functional teams.
3 years of experience in reliability engineering.
3 years of experience in a technical leadership role.
2 years of experience in a people management or team leadership role.

Preferred qualifications:

Master's degree or PhD in Computer Science or related technical field.
Experience designing, building, or operating highly available, fault-tolerant distributed systems. Direct experience with chaos engineering, fault-injection testing frameworks, or large-scale disaster recovery simulations.
Background in designing developer-facing products, APIs, SDKs, or self-service automation tools, with a strong emphasis on reducing friction and improving developer velocity.
Experience with Google’s server frameworks (Pod), gRPC/Stubby-based RPC layers, or container orchestrators is highly desirable.
Experience defining and driving organizational key metrics (SLIs/SLOs, adoption rates, platform health) to measure the success of infrastructure initiatives.

Individual pay is determined by factors including job-related skills, experience, and relevant education or training.

US: $207000 - $301000 (USD) + 20% bonus target + equity + benefits

Learn more about benefits at Google.

Responsibilities

Set and communicate team priorities that support the broader organization's goals. Align strategy, processes, and decision-making across teams.
Set clear expectations with individuals based on their level and role and aligned to the broader organization's goals. Meet regularly with individuals to discuss performance and development and provide feedback and coaching.
Develop the mid-term technical goal and roadmap within the scope of your (often multiple) team(s). Evolve the roadmap to meet anticipated future requirements and infrastructure needs.
Design, guide and vet systems designs within the scope of the broader area, and write product or system development code to solve ambiguous problems.
Review code developed by other engineers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).

Qualifications

Minimum qualifications:

Bachelor’s degree, or equivalent practical experience.
8 years of experience in software development.
5 years of experience creating product roadmaps, and working with cross-functional teams.
3 years of experience in reliability engineering.
3 years of experience in a technical leadership role.
2 years of experience in a people management or team leadership role.

Preferred qualifications:

Master's degree or PhD in Computer Science or related technical field.
Experience designing, building, or operating highly available, fault-tolerant distributed systems. Direct experience with chaos engineering, fault-injection testing frameworks, or large-scale disaster recovery simulations.
Background in designing developer-facing products, APIs, SDKs, or self-service automation tools, with a strong emphasis on reducing friction and improving developer velocity.
Experience with Google’s server frameworks (Pod), gRPC/Stubby-based RPC layers, or container orchestrators is highly desirable.
Experience defining and driving organizational key metrics (SLIs/SLOs, adoption rates, platform health) to measure the success of infrastructure initiatives.

Software Engineering Manager, Fault Tolerance Testing

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Responsibilities

Qualifications

Minimum qualifications:

Preferred qualifications:

Responsibilities

Qualifications

Minimum qualifications:

Preferred qualifications: