Hardware Systems Application Engineer - Csp

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

This role focuses on deploying and optimizing NVIDIA's AI-centric hardware platforms (like Grace-Blackwell) within Cloud Service Providers' data centers. The engineer will act as a primary point of contact, collaborating with customers and internal teams to solve complex hardware and software system issues, and contribute to future data center architectures. While the role is in AI data centers, the core responsibilities are hardware systems engineering and deployment, not direct AI/ML model development.

What you'd actually do

  1. Collaborate with major cloud service providers (CSP), their OEM/ODM’s, and internal teams to help deploy the latest NVIDIA Vera-Rubin AI rack’s
  2. Work with other domain expert teams at NVIDIA to ensure customer solutions are optimized for the highest performance servers in the world
  3. Solve deep server system technical issues at the hardware, software and application level, ensuring customer success and time to market
  4. Act as the central point of contact between CSP customers and NVIDIA architecture teams to develop future GPU-accelerated data center architectures and roadmaps

Skills

Required

  • Bachelor's or Master's degree in Computer Engineering, Electrical Engineering, or a related field (or equivalent experience)
  • 5+ years of proven experience in system-level design and integration of server products from concept to deployment
  • Solid understanding of x86 server architecture, PCIe, DDR, Infiniband and high-speed interconnects
  • Basic understanding of BMC (Baseboard Management Controller) architecture, I2C (SMBus), power management and system telemetry controls
  • Familiarity with Linux OS and command line experience
  • Knowledge of the latest PCIe Gen5 and Gen6 technical interface challenges
  • Strong problem-solving and analytical skills
  • excellent communication and teamwork skills
  • Strong analytical, problem-solving, time-management, and organizational skills
  • ability to manage multiple complex initiatives in dynamic environments

Nice to have

  • In-depth understanding of CPU, GPU, Networking and architectural tradeoffs
  • Self-motivated and eager to learn
  • Can work under high pressure and dynamic environments
  • A desire to understand technology deeply
  • Can communicate and explain clearly information relevant to your audience

What the JD emphasized

  • customer-facing hardware engineers
  • deploying next generation NVIDIA MGX platforms
  • AI-centric Data Centers
  • AI Factories
  • agentic AI processing
  • deploying these AI Factories
  • debug highly complex problems
  • deploying the latest generation racks
  • customer solutions are optimized
  • deep server system technical issues
  • GPU-accelerated data center architectures