Senior AI and Fsi Developer Technology Engineer

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +2

Senior AI and FSI Developer Technology Engineer at NVIDIA focused on optimizing AI and HPC workloads on NVIDIA CPUs and GPUs for the financial services industry. The role involves researching, designing, and developing techniques to accelerate these workloads, profiling and eliminating performance bottlenecks, and collaborating with internal and external experts to influence future hardware and software designs. The engineer will also publish and present their work.

What you'd actually do

  1. Researching, designing, and developing groundbreaking techniques to accelerate high-performance workloads for FSI-focused, pioneering AI on NVIDIA CPUs and GPUs.
  2. Working hands-on with leading technical experts to analyze, optimize, and scale complex AI and HPC workloads for modern CPU and GPU architectures.
  3. Profiling and eliminating performance bottlenecks across the stack: from algorithms to kernels to system-level behavior.
  4. Publishing and presenting your work in conferences, talks, and blogs to educate and inspire the broader developer community.
  5. Influencing the design of future hardware architectures, system software, libraries, and programming models by collaborating closely with NVIDIA research, hardware, compiler, and tools teams.

Skills

Required

  • low-level parallel programming (e.g., CUDA, OpenACC, OpenMP, MPI, pthreads, TBB, etc.)
  • CPU/GPU architecture fundamentals
  • C/C++
  • algorithms and software design
  • improving the performance of large-scale computational applications on GPUs
  • linear algebra

Nice to have

  • inference optimization techniques
  • deploying optimized AI models in production
  • TensorRT, TensorRT-LLM, and cuTile
  • capital markets
  • systematic/algorithmic strategies
  • quantitative trading
  • parallelizing and optimizing machine learning methods
  • financial data models
  • pricing and risk simulation algorithms
  • portfolio optimization

What the JD emphasized

  • push the limits of performance
  • performance tuning
  • optimize and scale complex AI and HPC workloads
  • Profiling and eliminating performance bottlenecks

Other signals

  • performance tuning
  • GPU optimization
  • parallel algorithms
  • AI workloads