Technical Program Manager, Cloud Infrastructure

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1

Technical Program Manager for NVIDIA DGX Cloud, focusing on cloud infrastructure bring-up, capacity enablement, and management. Responsibilities include collaborating with Cloud Service Providers (CSPs) and internal engineering teams to build and scale AI infrastructure globally, defining requirements for compute, storage, and networking to support GPUs, managing ongoing operations, and ensuring adherence to product lifecycle processes. The role emphasizes driving adoption of cloud infrastructure solutions, establishing KPIs, and mitigating risks.

What you'd actually do

  1. Working in close coordination with storage engineering and network engineering teams to define and communicate requirements to CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Providers). Drive alignment and a POR for capacity blocks based on workload needs.
  2. Drive early engagement with CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Providers) to understand their managed storage, network solutions and influence alignment with NVIDIA Cloud roadmap
  3. Gathering technical requirements, developing comprehensive roadmaps, establishing clear milestones, and ensuring adherence to our Product Lifecycle (PLC) process.
  4. Managing ongoing capacity operations and the engineering engagement with CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Provider) partners, collaborating closely with engineering leads. Focus on availability, maintenance and other critical performance indicators.
  5. Partner closely within NVIDIA to understand workload requirements and related hardware and infrastructure needs. This includes speeds and feeds to optimize infrastructure readiness with cloud vendors and NVIDIA Cloud Providers.

Skills

Required

  • 10+ years of technical program management experience
  • driven the planning and execution of large-scale cloud infrastructure programs with outside organizations
  • software engineering projects within a matrixed organization
  • cloud infrastructure
  • bring-up and end to end operations of compute, storage and GPU
  • Jira, Smartsheet, or similar program management tools
  • strategic and tactical thinking abilities
  • build consensus and drive program success
  • growing within ambiguous environments
  • communication and technical presentation skills
  • BS or MS in Electrical Engineering or Computer Science, or equivalent experience

Nice to have

  • In depth knowledge of NVIDIA GPU products, including deployment and bring-up
  • Working knowledge of various cloud technologies (Kubernetes, API integration, Terraform, etc)
  • Significant experience with productivity tools and process automation
  • Deep familiarity with cloud-native product / services environments
  • familiarity with AI, ML infrastructure, and cloud/services

What the JD emphasized

  • extensive background in cloud infrastructure bring-up with external partners
  • extensive hands-on experience in cloud infrastructure
  • Domain knowledge in the bring-up and end to end operations of compute, storage and GPU
  • Expert-level proficiency with Jira, Smartsheet, or similar program management tools
  • Outstanding strategic and tactical thinking abilities
  • Comfort and efficiency in growing within ambiguous environments
  • Possess excellent communication and technical presentation skills, particularly for executive audiences