Senior Technical Program Manager - Dgx Cloud Storage

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Senior Technical Program Manager to drive storage-related initiatives for NVIDIA's DGX Cloud, a platform for AI infrastructure deployment and scaling. The role involves leading cross-functional programs, defining project plans, managing risks, and ensuring program visibility across engineering, operations, and cloud partners. Requires extensive experience in program management, software/infrastructure projects, cloud platforms, distributed storage systems, and understanding of AI/ML/HPC workload storage requirements.

What you'd actually do

  1. Lead cross-functional storage programs from requirements gathering through execution and delivery.
  2. Drive alignment across NVIDIA storage engineering, operations, cloud service providers, clusters operators, resource governance and finance.
  3. Define project plans, schedules, and achievements for storage features, storage deployments, support, security, compliance, and observability.
  4. Partner with the engineering team and product management to define and deliver products roadmap.
  5. Manage technical risks and resolve blockers that impact quality, scope and delivery timelines.

Skills

Required

  • program management of large-scale software or infrastructure projects
  • driving programs across global, distributed teams
  • communication and organizational skills
  • align cross-org stakeholders
  • Jira and Confluence
  • software development
  • Agile methodologies
  • DevOps best practices
  • Cloud Platforms: AWS, Azure, GCP, or OCI storage services (Block, Object, File)
  • Distributed Storage Systems: SAN, NAS, object storage, and scalable distributed architectures such as Ceph or Lustre
  • Storage Performance: Understanding IOPS, latency, throughput optimization, and capacity planning for large-scale environments
  • Data Protection & DR: Familiarity with snapshots, backups, replication, and disaster recovery strategies
  • AI/ML & HPC Workloads: Understanding storage requirements for high-throughput AI training or data pipelines

Nice to have

  • storage operations, provisioning, performance monitoring, and troubleshooting
  • new product introduction
  • program managing research teams

What the JD emphasized

  • storage requirements for high-throughput AI training or data pipelines