Cpu Performance Developer Technology Engineer

NVIDIA NVIDIA · Semiconductors · Shanghai, China +2

NVIDIA is seeking a CPU Optimization Engineer to research, design, and implement performance optimization strategies for workloads including AI data preprocessing, scientific, and HPC applications on NVIDIA Grace/Vera CPUs. The role involves collaborating with developers, profiling, analyzing, and optimizing CPU performance, contributing to open-source frameworks, and influencing future CPU and software architecture.

What you'd actually do

  1. Collaborate with developers, researchers, and framework maintainers across industries to identify and resolve performance challenges in diverse workloads such as AI, data analytics, simulation, and numerical computing.
  2. Profile, analyze, and optimize CPU performance from application-level algorithms down to low-level microarchitecture.
  3. Contribute to open-source frameworks, key software stacks, reference implementations, and performance libraries to unlock full CPU potential.
  4. Work closely with NVIDIA’s architecture, research, libraries, tools, and system software teams to improve our overall platform performance.
  5. Provide insights that shape next-generation CPU designs, compiler toolchains, and development workflows for better developer productivity and throughput.

Skills

Required

  • BS, MS, or PhD in Computer Science, Computer Engineering, or a related field
  • 5+ years of relevant experience in performance engineering or CPU optimization
  • Strong programming proficiency in C/C++ and/or Python, with a deep understanding of algorithms and software architecture
  • Solid grasp of CPU microarchitecture, performance analysis tools, and optimization methodologies
  • Proven track record of CPU benchmarking and bottleneck-driven performance tuning
  • Excellent communication and organizational skills, with the ability to collaborate effectively across teams and manage multiple priorities

Nice to have

  • Experience optimizing AI or data preprocessing pipelines on CPUs
  • Familiarity with HPC applications, parallel computing, and distributed runtime environments
  • Hands-on experience with SIMD instruction sets, low-level intrinsics, or vectorization
  • Contributions to open-source performance tools or HPC frameworks

What the JD emphasized

  • 5+ years of relevant experience in performance engineering or CPU optimization
  • Proven track record of CPU benchmarking and bottleneck-driven performance tuning