Senior Manager, Cloud Services Platform

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1

Senior Manager, Cloud Services Platform role at NVIDIA, focusing on leading strategy and execution for cloud services that provide container, artifact, and ML model registry capabilities. The role involves partnering with AI infrastructure, security, product, and engineering teams to define standards for model packaging, image build pipelines, artifact management, and deployment workflows, while driving operational excellence for secure, scalable registry services supporting AI and cloud-native software delivery. Requires strong leadership experience in cloud platforms, distributed systems, and developer platforms.

What you'd actually do

  1. Lead strategy and execution for cloud services that provide container, artifact, and ML model registry capabilities for NVIDIA engineering teams.
  2. Partner with AI infrastructure, security, product, and engineering teams to define standards for model packaging, image build pipelines, artifact management, and deployment workflows.
  3. Establish and enforce policies for access control, compliance, software supply chain security, and auditability across registry and hub platforms.
  4. Define and track KPIs and SLAs for registry and related services, including availability, latency, storage efficiency, reliability, and developer experience.
  5. Drive operational excellence for secure, scalable registry services that support AI and cloud-native software delivery.

Skills

Required

  • 12+ overall years of software engineering experience with significant ownership of cloud platforms, distributed systems, developer platforms, artifact registry systems, storage systems, or infrastructure services.
  • 5+ years of engineering leadership experience, including experience managing managers or leading multiple senior technical workstreams through other leaders.
  • Bachelors degree or equivalent experience.
  • Proven success operating customer-facing or company-critical services with demanding availability, latency, throughput, data integrity, and security expectations.
  • Strong technical judgment in registry, artifact management, or developer platform systems.
  • Deep cloud-native systems background across Kubernetes, object storage, relational or NoSQL databases, event streaming, caching, API design, service-to-service authentication, and observability.
  • Experience with containerization and registries such as Docker, Kubernetes, Docker Hub, Harbor, ECR, GCR, GAR, or similar technologies at enterprise scale.
  • Excellent people leadership skills, including hiring, performance management, career development, and building diverse, high-performing teams.
  • Strong communication skills, with experience presenting to senior executives and influencing cross-organization priorities.

Nice to have

  • Directly led teams building and operating artifact management platforms at enterprise scale.
  • Led reliability transformations for large services, including SLO adoption, forecasting resource needs, load testing, and stress testing.
  • Worked in AI infrastructure, accelerated computing, model distribution, or regulated enterprise software delivery.
  • Track record of growing managers and senior technical leaders who can own ambiguous, high-impact platform areas without constant blocking issue.

What the JD emphasized

  • ML model registry capabilities
  • AI infrastructure
  • model packaging
  • deployment workflows
  • AI and cloud-native software delivery
  • customer-facing or company-critical services with demanding availability, latency, throughput, data integrity, and security expectations
  • regulated enterprise software delivery

Other signals

  • ML model registry capabilities
  • AI infrastructure
  • model packaging
  • deployment workflows
  • AI and cloud-native software delivery