AI Computing Software Development Engineer, LLM Inference

NVIDIA NVIDIA · Semiconductors · Shanghai, China +1

Software Development Engineer focused on LLM inference software (TensorRT LLM and TensorRT Edge LLM) at NVIDIA, involving crafting, scaling, performance analysis, optimization, and tuning of inferencing software for GPUs. The role requires strong C/C++ skills, experience with deep learning frameworks, and collaboration across teams.

What you'd actually do

  1. Craft and develop robust inferencing software that can be scaled to multiple platforms for functionality and performance
  2. Performance analysis, optimization and tuning
  3. Closely follow academic developments in the field of artificial intelligence and large language models
  4. Provide feedback into the architecture and hardware design and development
  5. Collaborate across the company to guide the direction of machine learning inferencing, working with software, research and product teams

Skills

Required

  • C/C++ programming
  • software design
  • debugging
  • performance analysis
  • test design
  • deep learning frameworks (TensorFlow, PyTorch)
  • software development experience

Nice to have

  • Masters or higher degree in Computer Engineering, Computer Science, Applied Mathematics or related computing focused degree
  • experience with TensorRT LLM
  • experience with TensorRT Edge LLM
  • experience with LLM, ChatGPT and GenerativeAI
  • experience with recommender models

What the JD emphasized

  • ability to work on a fast-paced delivery-focused team is required
  • excellent interpersonal skills are a must
  • Masters or higher degree in Computer Engineering, Computer Science, Applied Mathematics or related computing focused degree (or equivalent experience)
  • 2+ years of relevant software development experience
  • Excellent C/C++ programming and software design skills, including debugging, performance analysis, and test design
  • Proactive and able to work without supervision
  • Excellent written and oral communication skills in English

Other signals

  • LLM inference software
  • TensorRT LLM
  • performance analysis and optimization
  • GPU acceleration