Senior Deep Learning Algorithms Engineer - Bionemo

NVIDIA NVIDIA · Semiconductors · Ho Chi Minh City, Vietnam +1

Senior Deep Learning Algorithms Engineer at NVIDIA to optimize biology and structural biology models (LLMs, VLMs) for inference performance on GPUs using TensorRT-LLM and related stacks. Focus on low-latency, high-throughput inference, quantization, custom GPU kernels, and production deployment.

What you'd actually do

  1. Integrate TensorRT-LLM for BioNeMo models (Boltz1–2, OpenFold2–3) and upcoming structural biology models (RFDiffusion, DiffDock, ProteinNMN, Evo2, ESM3).
  2. Optimize models for low-latency, high-throughput inference using parallelism, quantization (FP8/INT8), and sparsity/pruning.
  3. Profile and debug deep learning workloads on GPUs, resolving kernel/graph bottlenecks in training/inference, including custom operators.
  4. Develop and validate custom GPU kernels (CUDA, Triton) for hot paths, memory-bound ops, and non-standard blocks in structural biology models.
  5. Collaborate with research to align model architecture and training with deployment constraints for smooth production transition.

Skills

Required

  • MS/PhD in CS, EE, Comp. Eng., or equivalent practical experience.
  • 5+ years professional experience in deep learning/applied ML
  • Strong foundation in transformer/diffusion architectures
  • Proficient in PyTorch (and/or TensorFlow) for production-grade model building, debugging, and deployment.
  • Strong Python/C++
  • Practical experience with TensorRT/TensorRT-LLM
  • Familiarity with GPU performance engineering

Nice to have

  • LLMs, VLMs, or large biology models (e.g., structure prediction)
  • read/modify performance-critical C++/CUDA code for inference stacks and custom ops
  • profiling (Nsight), roofline analysis, and optimization of kernels/memory access
  • experience writing/extending custom GPU kernels for model hot paths

What the JD emphasized

  • deploying optimized models/inference paths in production (not research prototypes)
  • required
  • custom GPU kernels for model hot paths is required

Other signals

  • optimize models for inference
  • deploying optimized models/inference paths in production
  • TensorRT-LLM
  • GPU performance engineering