AI and Fsi Developer Technology Engineer - New College Grad 2026

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +3 · Remote

NVIDIA is seeking an AI and FSI Developer Technology Engineer to optimize AI and HPC workloads on NVIDIA GPUs and CPUs, focusing on performance tuning and eliminating bottlenecks for financial markets. The role involves research, development, analysis, and collaboration with experts to improve performance across the stack, from algorithms to kernels. The engineer will also publish and present their work and influence future hardware/software designs.

What you'd actually do

  1. Researching, designing, and developing groundbreaking techniques to accelerate high-performance workloads for FSI-focused, pioneering AI on NVIDIA CPUs and GPUs.
  2. Working with leading technical experts to analyze, optimize, and scale complex AI and HPC workloads for modern CPU and GPU architectures.
  3. Profiling and eliminating performance bottlenecks across the stack: from algorithms to kernels to system-level behavior.
  4. Publishing and presenting your work in conferences, talks, and blogs to educate and inspire the broader developer community.
  5. Influencing the design of future hardware architectures, system software, libraries, and programming models by collaborating closely with NVIDIA research, hardware, compiler, and tools teams.

Skills

Required

  • Master’s or PhD degree (or equivalent experience) in Computer Science, Computer Engineering, or Electrical and Computer Engineering or related field
  • Relevant work or research experience
  • low-level parallel programming (e.g., CUDA)
  • Deep understanding of CPU/GPU architecture fundamentals and how they impact performance
  • Fluency in C/C++
  • solid foundations in algorithms and software design
  • Experience improving the performance of large-scale computational applications on GPUs
  • Good understanding of linear algebra
  • Strong communication and organization skills
  • logical approach to problem solving
  • solid prioritization abilities

Nice to have

  • Prior internship experience in a related field
  • Experience with inference optimization techniques and deploying optimized AI models in production
  • Experience with TensorRT, TensorRT-LLM, and cuTile
  • Background in capital markets with exposure to systematic/algorithmic strategies or quantitative trading
  • Experience parallelizing and optimizing machine learning methods such as decision trees, time series models, and Monte Carlo simulations
  • knowledge of financial data models, pricing and risk simulation algorithms, portfolio optimization, or other finance-focused applications and services

What the JD emphasized

  • low-level parallel programming (e.g., CUDA)
  • Deep understanding of CPU/GPU architecture fundamentals and how they impact performance
  • improving the performance of large-scale computational applications on GPUs

Other signals

  • performance tuning
  • GPU optimization
  • parallel programming
  • AI workloads
  • HPC