AI Computing Development Engineer, Tensorrt and Tensorrt-llm Aigv

NVIDIA · Semiconductors · Shanghai, China +2

NVIDIA is seeking software engineers to develop and optimize inferencing software (TensorRT/TensorRT-LLM) for AI computing. The role involves performance analysis, tuning, integrating AI advancements, and collaborating across teams to shape machine learning inferencing on NVIDIA platforms. Requires strong programming skills, experience with deep learning frameworks, and a proactive approach.

What you'd actually do

Design and develop robust inferencing software (TensorRT/TensorRT-LLM) optimized for functionality and performance across platforms
Perform performance analysis, optimization, and tuning of deep learning inference workloads
Track and integrate academic and industry advancements in AI and feature-update TensorRT/TensorRT-LLM accordingly
Provide feedback into architecture and hardware design and development
Collaborate across hardware, software, and research teams to shape the direction of machine learning inferencing across NVIDIA platforms

Skills

Required

Masters or higher degree in Computer Engineering, Computer Science, Applied Mathematics, or related computing-focused field (or equivalent experience)
Strong Python or C/C++ programming and software design experience
debugging
performance profiling
test design
2+ years working experience
Strong curiosity about artificial intelligence
familiarity with the latest developments in deep learning
Experience working with deep learning frameworks such as PyTorch, TensorRT/TensorRT-LLM, SGLang or vLLM
Proactive, self-driven
able to work independently
Excellent written and verbal communication skills in English
Demonstrated ability, commensurate with experience, to take technical ownership
solve complex problems
contribute effectively in cross-functional environments

Nice to have

generative models
multimodal systems
large neural networks

What the JD emphasized

delivery-focused environment is required
excellent interpersonal skills are a must
Strong curiosity about artificial intelligence
familiarity with the latest developments in deep learning
Proactive, self-driven
able to work independently
Excellent written and verbal communication skills in English
Demonstrated ability, commensurate with experience, to take technical ownership
solve complex problems
contribute effectively in cross-functional environments

Other signals

inference software
performance analysis
optimization
tuning
deep learning frameworks
GPU-accelerated AI

Apply on company site

● Active

Posted 8w ago · 53 days open

AI score: 8/10
Stage: Serve
Location: Shanghai, ChinaBeijing, ChinaShenzhen, China
Role: Mid · Infra
Function: Engineering
Domain: general
Team: AI Computing
Maturity: Scaling

Skills

Applied ML Domains

Data ScienceRecommendation Systems

Computer Vision & Multimodal

Computer VisionMultimodal AI

Frameworks & Tools

PyTorchSGLangTensorRTvLLM

General Experience & Skills

Complex SystemsDebuggingPerformance OptimizationSoftware Engineering

Infrastructure & Systems

Computer ArchitectureInference InfrastructureModel Serving

LLM & Foundation Models

Generative AILarge Language Models (LLMs)

Languages

C++Python

Leadership & Management

Cross-Functional CollaborationTechnical Leadership

ML Ops & Evaluation

A/B TestingFine-TuningInference OptimizationPerformance ProfilingProduction ML Systems

ML Techniques

Machine LearningOptimization Methods

Research & Credentials

Applied Mathematics

Speech & Audio

Speech & Audio Processing

Read full job description

NVIDIA is hiring software engineers for its AI Computing team. Academic and commercial groups around the world are using GPUs to power a revolution in deep learning-powered AI, enabling breakthroughs in areas like generative AI, computer vision, speech recognition, recommender systems, and large-scale language and multimodal models. Join the team building the inferencing software (TensorRT/TensorRT-LLM) that will be used across our product lines. The ability to work in a fast-paced, delivery-focused environment is required, and excellent interpersonal skills are a must.

What you'll be doing:

Design and develop robust inferencing software (TensorRT/TensorRT-LLM) optimized for functionality and performance across platforms
Perform performance analysis, optimization, and tuning of deep learning inference workloads
Track and integrate academic and industry advancements in AI and feature-update TensorRT/TensorRT-LLM accordingly
Provide feedback into architecture and hardware design and development
Collaborate across hardware, software, and research teams to shape the direction of machine learning inferencing across NVIDIA platforms
Own and deliver technical work with scope based on experience, ranging from complex features to substantial parts of larger projects, with increasing independence and technical leadership over time
Publish key technical results at leading scientific and engineering conferences

What we need to see:

Masters or higher degree in Computer Engineering, Computer Science, Applied Mathematics, or related computing-focused field (or equivalent experience)
Strong Python or C/C++ programming and software design experience, including debugging, performance profiling, and test design
2+ years working experience
Strong curiosity about artificial intelligence and familiarity with the latest developments in deep learning — including generative models, multimodal systems, and large neural networks
Experience working with deep learning frameworks such as PyTorch, TensorRT/TensorRT-LLM, SGLang or vLLM
Proactive, self-driven, and able to work independently
Excellent written and verbal communication skills in English
Demonstrated ability, commensurate with experience, to take technical ownership, solve complex problems, and contribute effectively in cross-functional environments

NVIDIA is widely considered to be one of technology’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. Does the idea of contributing to and pushing the boundaries of state-of-the-art AI and compute systems excite you? Interested in getting exposure to the entire deep learning software stack? Come join us and help build the GPU-accelerated AI platform used worldwide.

What you'll be doing:

Design and develop robust inferencing software (TensorRT/TensorRT-LLM) optimized for functionality and performance across platforms
Perform performance analysis, optimization, and tuning of deep learning inference workloads
Track and integrate academic and industry advancements in AI and feature-update TensorRT/TensorRT-LLM accordingly
Provide feedback into architecture and hardware design and development
Collaborate across hardware, software, and research teams to shape the direction of machine learning inferencing across NVIDIA platforms
Own and deliver technical work with scope based on experience, ranging from complex features to substantial parts of larger projects, with increasing independence and technical leadership over time
Publish key technical results at leading scientific and engineering conferences

What we need to see:

Masters or higher degree in Computer Engineering, Computer Science, Applied Mathematics, or related computing-focused field (or equivalent experience)
Strong Python or C/C++ programming and software design experience, including debugging, performance profiling, and test design
2+ years working experience
Strong curiosity about artificial intelligence and familiarity with the latest developments in deep learning — including generative models, multimodal systems, and large neural networks
Experience working with deep learning frameworks such as PyTorch, TensorRT/TensorRT-LLM, SGLang or vLLM
Proactive, self-driven, and able to work independently
Excellent written and verbal communication skills in English
Demonstrated ability, commensurate with experience, to take technical ownership, solve complex problems, and contribute effectively in cross-functional environments