What you'd actually do

Lead strategy and execution for cloud services that provide container, artifact, and ML model registry capabilities for NVIDIA engineering teams.

Partner with AI infrastructure, security, product, and engineering teams to define standards for model packaging, image build pipelines, artifact management, and deployment workflows.

Establish and enforce policies for access control, compliance, software supply chain security, and auditability across registry and hub platforms.

Define and track KPIs and SLAs for registry and related services, including availability, latency, storage efficiency, reliability, and developer experience.

Drive operational excellence for secure, scalable registry services that support AI and cloud-native software delivery.

Skills

Required

12+ overall years of software engineering experience with significant ownership of cloud platforms, distributed systems, developer platforms, artifact registry systems, storage systems, or infrastructure services.
5+ years of engineering leadership experience, including experience managing managers or leading multiple senior technical workstreams through other leaders.
Bachelors degree or equivalent experience.
Proven success operating customer-facing or company-critical services with demanding availability, latency, throughput, data integrity, and security expectations.
Strong technical judgment in registry, artifact management, or developer platform systems.
Deep cloud-native systems background across Kubernetes, object storage, relational or NoSQL databases, event streaming, caching, API design, service-to-service authentication, and observability.
Experience with containerization and registries such as Docker, Kubernetes, Docker Hub, Harbor, ECR, GCR, GAR, or similar technologies at enterprise scale.
Excellent people leadership skills, including hiring, performance management, career development, and building diverse, high-performing teams.
Strong communication skills, with experience presenting to senior executives and influencing cross-organization priorities.

Nice to have

Directly led teams building and operating artifact management platforms at enterprise scale.
Led reliability transformations for large services, including SLO adoption, forecasting resource needs, load testing, and stress testing.
Worked in AI infrastructure, accelerated computing, model distribution, or regulated enterprise software delivery.
Track record of growing managers and senior technical leaders who can own ambiguous, high-impact platform areas without constant blocking issue.

For over 25 years, NVIDIA has been revolutionizing computer graphics, PC gaming, and accelerated computing. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.

At NVIDIA, we are seeking a highly skilled Senior Engineer Operations Manager to join our world-class NGC Cloud team. In this role, you will help drive the efficiency, reliability, and scalability of the systems that power our global business operations. This is an exceptional opportunity to shape how we automate, streamline, and support critical operational workflows across the organization. You will define how we implement innovative automation and support solutions, enabling teams to operate seamlessly and deliver impact at global scale—all within an encouraging and inclusive environment.

What you'll be doing:

Lead strategy and execution for cloud services that provide container, artifact, and ML model registry capabilities for NVIDIA engineering teams.
Partner with AI infrastructure, security, product, and engineering teams to define standards for model packaging, image build pipelines, artifact management, and deployment workflows.
Establish and enforce policies for access control, compliance, software supply chain security, and auditability across registry and hub platforms.
Define and track KPIs and SLAs for registry and related services, including availability, latency, storage efficiency, reliability, and developer experience.
Drive operational excellence for secure, scalable registry services that support AI and cloud-native software delivery.
Mentor and coach engineering managers and senior individual contributors; build a strong leadership bench and a healthy, inclusive engineering culture.
Influence architecture and technical direction while empowering teams to own detailed design and implementation decisions.
Communicate clearly with senior leadership on strategy, risks, execution progress, and outcomes; represent the platform in multi-functional planning discussions.

What we need to see:

12+ overall years of software engineering experience with significant ownership of cloud platforms, distributed systems, developer platforms, artifact registry systems, storage systems, or infrastructure services.
5+ years of engineering leadership experience, including experience managing managers or leading multiple senior technical workstreams through other leaders.
Bachelors degree or equivalent experience.
Proven success operating customer-facing or company-critical services with demanding availability, latency, throughput, data integrity, and security expectations.
Strong technical judgment in registry, artifact management, or developer platform systems.
Deep cloud-native systems background across Kubernetes, object storage, relational or NoSQL databases, event streaming, caching, API design, service-to-service authentication, and observability.
Experience with containerization and registries such as Docker, Kubernetes, Docker Hub, Harbor, ECR, GCR, GAR, or similar technologies at enterprise scale.
Excellent people leadership skills, including hiring, performance management, career development, and building diverse, high-performing teams.
Strong communication skills, with experience presenting to senior executives and influencing cross-organization priorities.

Ways to stand out from the crowd:

Directly led teams building and operating artifact management platforms at enterprise scale.
You have led reliability transformations for large services, including SLO adoption, forecasting resource needs, load testing, and stress testing.
Worked in AI infrastructure, accelerated computing, model distribution, or regulated enterprise software delivery.
Track record of growing managers and senior technical leaders who can own ambiguous, high-impact platform areas without constant blocking issue.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 431,250 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until June 26, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.