Lead Software Engineer, Infrastructure Quality, Robotics, Deepmind

Google · Big Tech · Mountain View, CA +1

Lead Software Engineer focused on quality and engineering productivity for robotics AI, responsible for scaling software releases, AI model validation, testing environments, and system reliability to accelerate AGI development in the physical world.

What you'd actually do

Design, implement, and own the long-term roadmap and execution for large scale software and HWITL testing environments, conventional and on-robot testing, as well as manual and automated.
Design, deploy, and manage a robust software release discipline, with flexible powerful staging environments, and comprehensive regression checks for software, AI models, and robot hardware over time.
Define productivity metrics such as code in production, bottlenecks, code debt, reducing, and improving code health, and optimize the data/ML Flywheel, from ingestion and labeling to model evaluation and deployment.
Lead application of agentic AI, including automating root-cause analysis of HWITL failures, edge-case scenarios, or data quality or AI model performance problems.
Drive end-to-end system reliability for software systems and robot fleets. Partner with external hardware vendors and internal teams to co-develop, integrate, and test joint software and hardware releases.

Skills

Required

programming in a general purpose coding language (e.g., C, C++, Java, JavaScript, or Python)
people management, supervision/team leadership role
software quality and release strategy on custom or experimental hardware
design, implement, and own the long-term roadmap and execution for large scale software and HWITL testing environments
design, deploy, and manage a robust software release discipline
define productivity metrics
optimize the data/ML Flywheel
application of agentic AI
drive end-to-end system reliability for software systems and robot fleets

Nice to have

Master's degree or PhD in Computer Science, Artificial Intelligence, Machine Learning, or related technical fields
systems development engineering
system reliability engineering
software engineering with a focus on operations
advanced systems administration

What the JD emphasized

software quality and AI model validation
Artificial General Intelligence (AGI)
testing, reliability, validation, release strategy, and overall engineering productivity and velocity
non-deterministic hardware wear-and-tear, complex large-scale model deployment, fast inference, and the massive data requirements of AGI
software quality and release strategy on custom or experimental hardware
high quality bar
data/ML Flywheel, from ingestion and labeling to model evaluation and deployment
agentic AI
HWITL failures, edge-case scenarios, or data quality or AI model performance problems
end-to-end system reliability for software systems and robot fleets

Other signals

scaling software quality and AI model validation
influencing the velocity at which our team solves Artificial General Intelligence (AGI) in the physical world
building and upscaling our testing, reliability, validation, release strategy, and overall engineering productivity and velocity
solving for non-deterministic hardware wear-and-tear, complex large-scale model deployment, fast inference, and the massive data requirements of AGI
experienced with software quality and release strategy on custom or experimental hardware
work will directly determine how many iterations our engineers and scientists can run per day, week, and year, while holding a high quality bar
optimizing the data/ML Flywheel, from ingestion and labeling to model evaluation and deployment
automating root-cause analysis of HWITL failures, edge-case scenarios, or data quality or AI model performance problems
drive end-to-end system reliability for software systems and robot fleets

Read full job description

We believe there are many problems in the world in which robotics could play a significant role in making it easier, faster and safer for people to get things done. We’re looking for roboticists, designers, hardware and software engineers to help us explore these possibilities, develop breakthrough technologies, and build new products that could help millions of people.

This is a dedicated Software Quality and Engineering Productivity leadership role. In this role, you will be an expert whose primary passion and career focus is building, owning, and scaling the quality of software releases and AI model validation on standard and custom hardware. You will play a major role in influencing the velocity at which our team solves Artificial General Intelligence (AGI) in the physical world. As our software and hardware infrastructure grows to support the increasing scale of robotics research, we need to continuously build and upscale our testing, reliability, validation, release strategy, and overall engineering productivity and velocity.

Robotics infrastructure is uniquely difficult. You will be solving for non-deterministic hardware wear-and-tear, complex large-scale model deployment, fast inference, and the massive data requirements of AGI. You will be experienced with software quality and release strategy on custom or experimental hardware. Your work will directly determine how many iterations our engineers and scientists can run per day, week, and year, while holding a high quality bar.

Artificial intelligence will be one of humanity’s most transformative inventions. At DeepMind, we are a pioneering AI lab with exceptional interdisciplinary teams focused on advancing AI development to solve complex global challenges and accelerate high-quality product innovation for billions of users. We use our technologies for widespread public benefit and scientific discovery, ensuring safety and ethics are always our highest priority.

We are pushing the boundaries across multiple domains. Our global teams offer various learning opportunities and varied career pathways for those driven to achieve exceptional results through collective effort.

Individual pay is determined by factors including job-related skills, experience, and relevant education or training.

US: $262000 - $365000 (USD) + 25% bonus target + equity + benefits

Learn more about benefits at Google.

Responsibilities

Design, implement, and own the long-term roadmap and execution for large scale software and HWITL testing environments, conventional and on-robot testing, as well as manual and automated.
Design, deploy, and manage a robust software release discipline, with flexible powerful staging environments, and comprehensive regression checks for software, AI models, and robot hardware over time.
Define productivity metrics such as code in production, bottlenecks, code debt, reducing, and improving code health, and optimize the data/ML Flywheel, from ingestion and labeling to model evaluation and deployment.
Lead application of agentic AI, including automating root-cause analysis of HWITL failures, edge-case scenarios, or data quality or AI model performance problems.
Drive end-to-end system reliability for software systems and robot fleets. Partner with external hardware vendors and internal teams to co-develop, integrate, and test joint software and hardware releases.

Qualifications

Minimum qualifications:

Bachelor's degree in Computer Science, a related technical field, or equivalent practical experience.
8 years of experience programming in a general purpose coding language (e.g., C, C++, Java, JavaScript, or Python).
Experience in a people management, supervision/team leadership role.

Preferred qualifications:

Master's degree or PhD in Computer Science, Artificial Intelligence, Machine Learning, or related technical fields.
3 years of experience in systems development engineering, system reliability engineering, software engineering with a focus on operations, or advanced systems administration.