AI Computing Development Engineer, Tensorrt-llm

NVIDIA NVIDIA · Semiconductors · Shanghai, China

NVIDIA is seeking software engineers to develop and optimize inferencing software for AI models, specifically focusing on TensorRT-LLM. This role involves performance analysis, tuning, and collaboration across teams to advance machine learning inferencing capabilities.

What you'd actually do

  1. Craft and develop robust inferencing software that can be scaled to multiple platforms for functionality and performance
  2. Performance analysis, optimization and tuning
  3. Closely follow academic developments in the field of artificial intelligence and feature update TensorRT-LLM
  4. Provide feedback into the architecture and hardware design and development
  5. Collaborate across the company to guide the direction of machine learning inferencing, working with software, research and product teams

Skills

Required

  • C/C++ or Python programming
  • software design
  • debugging
  • performance analysis
  • test design
  • deep learning frameworks (PyTorch, TensorRT-LLM, SGLang, vLLM)
  • English communication

Nice to have

  • Masters or higher degree in Computer Engineering, Computer Science, Applied Mathematics or related computing focused degree
  • 2+ years of relevant software development experience
  • curiosity about artificial intelligence
  • awareness of latest deep learning developments (LLMs, generative models)
  • proactive and able to work without supervision

What the JD emphasized

  • fast-paced delivery-focused team is required
  • excellent interpersonal skills are a must
  • Masters or higher degree in Computer Engineering, Computer Science, Applied Mathematics or related computing focused degree (or equivalent experience)
  • 2+ years of relevant software development experience
  • Excellent C/C++ or Python programming and software design skills, including debugging, performance analysis, and test design
  • Strong curiosity about artificial intelligence, awareness of the latest developments in deep learning like LLMs, generative models
  • Experience working with deep learning frameworks PyTorch, TensorRT-LLM, SGLang, vLLM
  • Proactive and able to work without supervision
  • Excellent written and oral communication skills in English

Other signals

  • building the inferencing software
  • performance analysis, optimization and tuning
  • guide the direction of machine learning inferencing