Senior Software Engineer, Tensorrt-llm

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1 · Remote

NVIDIA is seeking a Senior Software Engineer for its TensorRT-LLM team to develop and scale inferencing software for LLMs and Generative AI. The role involves crafting robust inferencing software, performing benchmarking and profiling for GPU applications, writing high-quality Python code for LLM inference, and improving the TensorRT-LLM library. Collaboration with software, research, and product teams is key.

What you'd actually do

  1. Craft and develop robust inferencing software that can be scaled to multiple platforms for functionality and performance
  2. Perform benchmarking, profiling, and system-level programming for GPU applications.
  3. Provide code reviews, design docs, and tutorials to facilitate collaboration among the team.
  4. Conduct unit tests and performance tests for different stages of the inference pipeline.
  5. Write safe, scalable, modular, and high-quality (Python) code for our core backend software for LLM inference.

Skills

Required

  • Masters or higher degree in Computer Engineering, Computer Science, Applied Mathematics or related computing focused degree (or equivalent experience)
  • 4+ years of relevant software development experience
  • Excellent Python programming and software design skills, including debugging, performance analysis, and test design
  • Strong curiosity about artificial intelligence, awareness of the latest developments in deep learning like LLMs, generative and recommender models
  • Experience working with deep learning frameworks like TensorFlow and PyTorch
  • Self-starter who consistently takes initiative to drive projects forward

Nice to have

  • Prior experience with a LLM framework or a DL compiler in inference, deployment, algorithms, or implementation
  • Prior experience with performance modeling, profiling, debug, and code optimization of a DL/HPC/high-performance application
  • Architectural knowledge of CPU and GPU
  • GPU programming experience (CUDA or OpenCL)

What the JD emphasized

  • excellent interpersonal skills are a must
  • excellent written and oral communication skills in English

Other signals

  • building the inferencing software which is foundational to product lines within NVIDIA and across the industry
  • craft and develop robust inferencing software that can be scaled to multiple platforms for functionality and performance
  • perform benchmarking, profiling, and system-level programming for GPU applications
  • write safe, scalable, modular, and high-quality (Python) code for our core backend software for LLM inference
  • improve the usability of the TensorRT-LLM library