AI Hire Signal
JobsCompaniesTrendsInsightsWeekly
JobsStrategy timeline
AI Hire Signal

Tracking AI hiring across 200+ US tech companies. Stage, salary, and stack signals on every role — refreshed weekly.

Contact

Browse

JobsCompaniesTrendsInsightsWeekly

Resources

AboutSitemapRobots

Legal

PrivacyTerms
© 2026 AI Hire Signal·Not affiliated with companies shown

Currently tracking 440 active AI roles, down 53% versus the prior 4 weeks. Primary focus: Serve · Engineering. Salary range $100k–$575k (avg $262k).

Hiring
440 / 623
Momentum (4w)
↓-386 -53%
340 opens last 4w · 726 prior 4w
Salary range · avg $262k
$100k–$575k
USD · disclosed roles only
Tracked since
May '25
last role 4w ago
Hiring velocityscroll left for older weeks
1 new role
Dec 30
1 new role
Mar 10
1 new role
24
1 new role
Apr 28
4 new roles
May 12
5 new roles
19
3 new roles
26
3 new roles
Jun 2
2 new roles
9
1 new role
16
2 new roles
23
3 new roles
30
4 new roles
Jul 7
1 new role
14
2 new roles
28
4 new roles
Aug 11
6 new roles
18
2 new roles
25
3 new roles
Sep 1
8 new roles
15
3 new roles
22
6 new roles
29
2 new roles
Oct 6
2 new roles
13
3 new roles
20
6 new roles
27
9 new roles
Nov 3
8 new roles
10
8 new roles
17
4 new roles
24
11 new roles
Dec 1
9 new roles
8
14 new roles
15
10 new roles
22
8 new roles
29
107 new roles
Jan 5
22 new roles
12
45 new roles
19
32 new roles
26
59 new roles
Feb 2
64 new roles
9
63 new roles
16
83 new roles
23
83 new roles
Mar 2
88 new roles
9
97 new roles
16
72 new roles
23
215 new roles
30
158 new roles
Apr 6
250 new roles
13
199 new roles
20
332 new roles
27
304 new roles
May 4
189 new roles
11
131 new roles
18
102 new roles
25
129 new roles
Jun 1
122 new roles
8
49 new roles
15
40 new roles
22

NVIDIA currently has 496 active AI-related job listings. The majority of these roles, 52%, are focused on serving infrastructure, with agents representing another significant segment at 23%. Engineering is the dominant function, with 441 positions. The United States leads hiring geographies with 287 roles, followed by China with 64. Frequent tech tags include model_serving, inference_infra, and agent_orchestration, suggesting a focus on deployment and management of AI models. Over the last 30 days, NVIDIA posted 214 new AI roles, a 27% decrease compared to the previous 30-day period.

Auto-generated from active job postings · last refreshed 2026-05-24

Frequently asked questions

  • What AI roles is NVIDIA hiring for?

    NVIDIA currently has 487 active AI-related roles in our index. The most common open titles are: Deep Learning Performance Architect (4), Senior Deep Learning Performance Architect (4), AI Research Scientist (3), Developer Technology Engineer - AI (3), Manager, Deep Learning Algorithms (3). Most positions are in Engineering and Research.

  • What stage of AI development does NVIDIA focus on?

    NVIDIA's active AI hiring is concentrated in: serving infrastructure (54%), agents (21%), application (8%). These categories follow a seven-stage AI lifecycle: data, pre-training, post-training, serving infrastructure, agents, evaluation, and application.

  • Where is NVIDIA hiring AI talent?

    NVIDIA is hiring AI talent in: United States (286 roles), China (59 roles), Israel (50 roles), Germany (21 roles).

  • What technologies does NVIDIA's AI team work with?

    Job postings at NVIDIA most frequently reference: model serving, inference infra, agent orchestration, llm observability, multimodal.

  • How many AI roles has NVIDIA posted recently?

    In the past 30 days, NVIDIA has posted 110 new AI-related roles. That is a -50% change versus the prior 30 days (218 → 110).

Jobs (236)

434 AI · 1824 total active
FilteredStageServe×
Show
Active onlyAI only (≥ 7)
Stage
AllData · 17Pretrain · 20Post-train · 28Serve · 236Agent · 95Eval Gate · 5Ship · 33
Function
AllEngineering · 375Research · 57Product · 2
Country
AllUnited States · 259China · 55Israel · 43Germany · 21Switzerland · 18United Kingdom · 14India · 13Poland · 12Vietnam · 12Canada · 10Italy · 7Netherlands · 6Singapore · 6France · 5Taiwan · 4Finland · 2Spain · 2Armenia · 1Czech Republic · 1Hungary · 1Japan · 1Romania · 1South Korea · 1Sweden · 1
Sort
AI scoreRecentTitle
TitleStageFunctionLocationFirst seenAI score
Research Scientist, ML Systems - PhD New College Grad 2026
Research Scientist role focusing on ML Systems, contributing to hardware, software, and infrastructure for training, fine-tuning, and serving ML models at scale. Requires a PhD and expertise in systems research areas.
ServePost-trainResearchSingapore, Singapore · RemoteDec '259
Senior Software Architect, AI Networking
Senior Software Architect role focused on designing and optimizing large-scale LLM inference infrastructure on GPU clusters, involving system-level optimizations for latency, throughput, and cost-efficiency.
ServeEngineeringTel Aviv, IsraelDec '259
51–100 of 236← Prev12345Next →
Senior Software Research Architect, AI Networking
NVIDIA is seeking a Senior Software Research Architect to improve the framework for large-scale LLM learning and prediction. This role focuses on designing and optimizing systems for generative AI workloads on advanced GPU clusters, specifically leveraging the NVIDIA Spectrum-X Networking Platform to define deployment and scaling strategies. The architect will work on inter-node communication, compute scheduling, and system-level optimization, collaborating with engineers and researchers to enable generative AI technologies in real-world applications.
ServePretrain
Research
Tel Aviv, Israel
Nov '25
9
AI Computing Software Development Engineer, TensorRT-LLM
NVIDIA is seeking a Software Development Engineer for its TensorRT-LLM team to develop and optimize LLM inference software for various platforms. The role involves performance analysis, tuning, and contributing to the architecture and hardware design, with a focus on scaling inference capabilities.
ServeEngineeringTaipei, Taiwan +1Sep '259
AI Computing Software Development Engineer, LLM Inference
Software Development Engineer focused on LLM inference software (TensorRT LLM and TensorRT Edge LLM) at NVIDIA, involving crafting, scaling, performance analysis, optimization, and tuning of inferencing software for GPUs. The role requires strong C/C++ skills, experience with deep learning frameworks, and collaboration across teams.
ServeEngineeringShanghai, China +11w ago8
Senior Inference Engineer, AIConfigurator for Dynamo
Senior Inference Engineer role focused on optimizing LLM inference deployment configurations using AIConfigurator, integrating GPU systems, model serving, and performance modeling for NVIDIA platforms.
ServeEngineeringSanta Clara, CA +1 · Remote2w ago8
AI Computing Software Development Engineer, TensorRT
NVIDIA is seeking an AI Computing Software Development Engineer for its TensorRT team to craft and develop robust, scalable inferencing software for GPUs. The role involves performance analysis, optimization, tuning, and collaborating with various teams to guide the direction of machine learning inferencing. Requires a Masters or higher degree, 2+ years of software development experience, strong C/C++ skills, and familiarity with deep learning frameworks.
ServeEngineeringShanghai, China2w ago8
AI Computing Development Engineer, TensorRT and TensorRT-LLM AIGV
NVIDIA is seeking software engineers to develop and optimize inferencing software (TensorRT/TensorRT-LLM) for AI computing. The role involves performance analysis, tuning, integrating AI advancements, and collaborating across teams to shape machine learning inferencing on NVIDIA platforms. Requires strong programming skills, experience with deep learning frameworks, and a proactive approach.
ServeEngineeringShanghai, China +22w ago8
DL System Software Engineer - AI Platform
NVIDIA is seeking a DL System Software Engineer to join their AI Platform team. The role involves developing and building solutions for scheduling large-scale AI training and inference workloads on GPU clusters, optimizing performance and efficiency for large models. The engineer will work on core infrastructure, resource management, and GPU scheduling, contributing to NVIDIA's AI platform.
ServePost-trainEngineeringToronto, ON2w ago8
Software Engineer, AI Networking Architect
NVIDIA is seeking an AI Networking Architect to optimize AI workload performance by analyzing AI models, distributed training, and inference workloads, and translating research insights into software, hardware, and networking architecture requirements. The role involves building platforms and simulations to evaluate trade-offs and influence future NVIDIA product roadmaps.
ServeAgentEngineeringTel Aviv, Israel +12w ago8
GPU Performance Engineer - Neural Reconstruction
GPU Performance Engineer focused on optimizing neural reconstruction and Gaussian Splatting workloads, involving PyTorch, CUDA, and GPU profiling to improve training and rendering performance.
ServePost-trainEngineeringCanada · Remote3w ago8
Developer Technology Engineer - AI
NVIDIA is seeking an AI Developer Technology Engineer to study and develop cutting-edge deep learning techniques, analyze and optimize performance on GPU architectures, and work with customers to provide AI solutions using GPUs. The role involves close collaboration with internal NVIDIA teams to influence future architectures and software platforms.
ServeEngineeringShanghai, China +23w ago8
Systems Performance Engineer, Agentic AI Workloads – New College Grad 2026
This role focuses on modeling, simulating, and analyzing the system-level performance of agentic AI workloads in datacenter environments. The engineer will develop simulators, characterize LLM serving traffic, identify performance bottlenecks, and provide architectural recommendations for next-generation AI systems. The role requires strong programming skills in C++ and Python, a solid understanding of queueing theory, traffic modeling, and statistics, as well as fundamentals of deep learning and LLM inference serving.
ServeAgentEngineeringSanta Clara, CA +23w ago8
Deep Learning Computer Architect - New College Grad 2026
NVIDIA is seeking a Deep Learning Computer Architect to design hardware accelerator and processor architectures for next-generation platforms, enabling state-of-the-art machine learning and data analytics. The role involves analyzing DL methods, proposing new features for acceleration, and studying their benefits, with a focus on LLM workloads and deep learning kernels.
ServeEngineeringSanta Clara, CA +13w ago8
Manager, Deep Learning Algorithms
Manager to lead engineering activities for productizing Deep Learning models, focusing on implementing and optimizing state-of-the-art algorithms for GPU-accelerated platforms. The role involves leading a team, collaborating with internal partners on roadmap development, and deploying training and inference workloads.
ServeDataEngineeringWarsaw, Poland +1 · Remote4w ago8
Engineering Manager, Inference Benchmarking — AI Perf
Engineering Manager for NVIDIA's AIPerf platform, a standard for assessing LLM serving performance. The role involves leading a team to build and advance the platform, focusing on core infrastructure, accuracy of benchmark results, and advising on upstream engine integrations for various AI workloads (LLM, multimodal, diffusion, computer vision). Requires strong systems engineering, inference infrastructure, and open-source community experience.
ServeEngineeringSanta Clara, CA +5 · Remote4w ago8
AI Computing Development Engineer, TensorRT and TensorRT-LLM
NVIDIA is seeking software engineers to develop and optimize AI inference software (TensorRT/TensorRT-LLM) for GPUs. The role involves performance analysis, tuning, integrating new advancements, and collaborating across teams to shape the future of machine learning inferencing.
ServeEngineeringShanghai, China4w ago8
GPU Performance Engineer - Neural Reconstruction
GPU Performance Engineer focused on optimizing neural reconstruction and Gaussian Splatting workloads. This role involves profiling, identifying bottlenecks, and improving performance in CUDA, PyTorch, and C++ for training and rendering, while ensuring reconstruction quality is maintained. It requires strong programming, GPU optimization, and performance analysis skills, with collaboration across research and engineering teams.
ServeDataEngineeringCA +5 · Remote4w ago8
Senior DGX Cloud AI Infrastructure Software Engineer
NVIDIA is seeking a Senior DGX Cloud AI Infrastructure Software Engineer to design, build, and maintain AI infrastructure for large-scale AI training and inferencing. The role involves optimizing efficiency and resiliency of AI workloads, developing scalable AI and Data infrastructure tools, and ensuring high availability of AI systems.
ServeDataEngineeringShanghai, China5w ago8
Senior AI Infrastructure Software Engineer - DGX Cloud
NVIDIA is seeking a Senior AI Infrastructure Software Engineer to design, build, and maintain AI platforms for large-scale AI training, inferencing, fine-tuning, and Agentic AI in production. The role involves developing platform and tools for AI/ML workload efficiency, resiliency, and observability, with a focus on distributed systems and Kubernetes.
ServeEngineeringSanta Clara, CA +3 · Remote6w ago8
Senior Performance Compiler Engineer - Triton
Senior Performance Compiler Engineer to work on the open-source Triton compiler project, focusing on using compilers to improve AI performance on NVIDIA GPUs for large language models, agents, and other AI applications. The role involves investigating GPU hardware, designing and implementing compiler technology using MLIR to optimize kernel descriptions for efficient GPU code generation, and collaborating with internal teams.
ServeEngineeringRedmond, WA +5 · Remote7w ago8
Senior GPU System Architect
Seeking a Senior GPU System Architect to design multi-GPU scale-up and scale-out systems for AI and HPC datacenters. The role involves defining system architectures that integrate GPU compute, memory, and interconnects for optimal AI performance and scalability. Requires deep experience in system-level fabric/networking architecture and hardware-software co-design.
ServeEngineeringSanta Clara, CA7w ago8
Senior Deep Learning Performance Architect
Senior Deep Learning Performance Architect at NVIDIA to design and evaluate hardware architectures for AI/HPC applications, focusing on LLM inference and training performance, and optimizing system bottlenecks.
ServePost-trainEngineeringSanta Clara, CA +17w ago8
Senior Data Center Performance Engineer - Benchmarking and Optimization
Senior Data Center Performance Engineer at NVIDIA focused on benchmarking and optimizing data center platforms for AI training, inference, and HPC workloads. Responsibilities include designing benchmarks, characterizing workloads, identifying bottlenecks, and driving performance improvements through system tuning and architectural recommendations.
ServeEngineeringSanta Clara, CA +1 · Remote7w ago8
NCX Engineer, AI Accelerator
This role focuses on engineering and deploying AI infrastructure and solutions for strategic customers, optimizing large-scale training and inference workloads on NVIDIA's AI platform. It involves MLOps, Kubernetes, GPU scheduling, and performance tuning, with a strong emphasis on customer-facing technical support and collaboration.
ServePost-trainEngineeringSanta Clara, CA +17w ago8
Machine Learning Applications and Compiler Engineer, LPX - New College Grad 2026
NVIDIA is seeking engineers to develop algorithms and optimizations for their LPX inference and compiler stack, working at the intersection of large-scale systems, compilers, and deep learning to optimize neural network workloads on future NVIDIA platforms. The role involves building and maintaining high-performance runtime and compiler components, defining workload mappings, integrating with the SW ecosystem, benchmarking, profiling, and collaborating with hardware teams. It also includes prototyping new compilation techniques and publishing technical work.
ServeEngineeringToronto, ON +1 · Remote7w ago8
Senior Deep Learning Framework Communications Engineer
Senior Deep Learning Framework Communications Engineer at NVIDIA, focusing on integrating and optimizing communication libraries (NCCL, NVSHMEM) within AI frameworks (PyTorch, TRT-LLM, vLLM, JAX) to enhance performance for large-scale AI training and inference. The role involves deep analysis of AI workloads, compiler improvements, and kernel authoring for multi-GPU systems.
ServeEngineeringSanta Clara, CA +4 · Remote7w ago8
Director, System Software Engineering - Metropolis Accelerated and Inferencing Software
NVIDIA is seeking a Director of System Software Engineering to lead teams responsible for the full lifecycle of Vision AI strategy, from model onboarding to production deployment. The role focuses on transforming foundation models into real-time, GPU-accelerated video intelligence systems, scaling multimodal reasoning, and enabling agentic development workflows. Key responsibilities include architecting and operationalizing inference acceleration, driving implementations of frameworks like TensorRT and VLLM, collaborating with partners on custom models, and ensuring performance benchmarking. The ideal candidate has extensive experience in deep learning, GPU optimization, and leading engineering teams in embedded and enterprise platforms.
ServeAgentEngineeringSanta Clara, CA8w ago8
Senior Software Architect - Deep Learning and HPC Communications
Senior Software Architect role at NVIDIA focused on designing and implementing next-generation data center platforms and scalable communication software for AI and HPC workloads. The role involves investigating performance bottlenecks, developing new communication technologies, exploring hardware/software co-design, and building proofs-of-concept to drive innovation in large-scale GPU clusters.
ServeEngineeringSanta Clara, CA +4 · Remote8w ago8
Senior Software Engineer, Deep Learning Inference
Senior Software Engineer focused on optimizing deep learning inference for LLMs and omnimodal architectures on NVIDIA hardware, including GPU kernel tuning, distributed inference, and contributing to open-source libraries.
ServeEngineeringTel Aviv, Israel8w ago8
Senior Hardware Architect, Deep Learning GPU and System
Senior Hardware Architect role focused on designing next-generation GPUs and systems to advance the state of AI, analyzing deep learning workloads, and proposing new features for acceleration. Requires 8+ years of experience in performance, hardware architecture, and deep learning analysis.
ServeEngineeringYokneam, Israel8w ago8
Senior Software Engineer - VLM Microservices for Neural Reconstruction
Senior Software Engineer to design, build, and optimize containerized inference execution for 3D Vision Language Models (VLMs) for neural reconstruction, turning research into production-grade software (NIMs). The role involves developing benchmarks, releasing and maintaining models, contributing to open-source projects like vLLM, and collaborating with research and product teams. Requires experience with AI distributed systems, inference platforms, Python/C++, and software engineering fundamentals.
ServePost-trainEngineeringSanta Clara, CA +18w ago8
Principal AI and ML Infra Software Engineer, GPU Clusters
This role focuses on enhancing the efficiency of AI and ML research on GPU clusters by collaborating with researchers to identify and address infrastructure deficiencies. The engineer will optimize performance, monitor resource utilization, and contribute to the AI/ML infrastructure ecosystem, keeping up-to-date with the latest AI/ML technologies.
ServeEngineeringSanta Clara, CA +18w ago8
Senior Deep Learning Software Engineer - Autonomous Vehicles
Senior Deep Learning Software Engineer focused on developing and productizing deep learning solutions for autonomous vehicles. The role involves training, fine-tuning, optimizing perception DNNs, applying quantization, improving DNN architectures, and enhancing inference speed and power consumption. It requires strong programming skills, experience with deep learning frameworks, computer vision tasks, and familiarity with CNNs and Transformer architectures. Experience with low precision inference, quantization, and NVIDIA software libraries is a plus.
ServePost-trainEngineeringSanta Clara, CA +3 · RemoteApr 248
Compiler Engineer - AI Inference
NVIDIA is seeking an AI Compiler Engineer to optimize kernel generation and computational graph optimizations for AI inference and training workloads on next-generation GPUs. The role involves hands-on development, collaboration on hardware/software co-design, and scaling AI deployments in datacenters.
ServePost-trainEngineeringSanta Clara, CAApr 248
Senior Software Engineer, Metropolis Vision AI
Senior Software Engineer to develop and optimize high-performance Vision AI pipelines and large-scale distributed services for processing video, image, and 3D data. The role involves crafting real-time systems, developing multi-modal perception, using simulation/synthetic data, and profiling/tuning GPU-accelerated inference pipelines. Collaboration with research and platform teams is key, with an emphasis on bringing research into production at scale.
ServePost-trainEngineeringSanta Clara, CAApr 248
Senior Software Engineer, AI Networking
Senior Software Engineer role focused on building and productizing ML tools for optimizing AI workloads (LLM training/inference) across GPU/CPU clusters, with a focus on networking and system resource utilization. Involves distributed deep learning, ML-based optimization techniques, and performance analysis.
ServeAgentEngineeringSanta Clara, CA +1Apr 248
Senior Performance Engineer - LLM Inference Frameworks
NVIDIA is seeking a Senior Performance Engineer to optimize LLM inference infrastructure on GPUs, focusing on throughput, memory efficiency, and scalability. The role involves designing and implementing high-performance pipelines, profiling, tuning model execution, and innovating techniques like Speculative Decoding and quantization. Experience with deep learning frameworks and performance debugging is required.
ServeEngineeringYokneam, Israel +3Apr 208
AI Computing Development Engineer, TensorRT-LLM
NVIDIA is seeking software engineers to develop and optimize inferencing software for AI models, specifically focusing on TensorRT-LLM. This role involves performance analysis, tuning, and collaboration across teams to advance machine learning inferencing capabilities.
ServeEngineeringShanghai, ChinaApr 178
Senior Software Engineer, JAX
Senior Software Engineer to develop NVIDIA's AI platform, focusing on performance optimizations in deep learning frameworks using JAX. The role involves designing and implementing JAX core components, driving performance on NVIDIA products, and building tools to increase efficiency for AI-based systems.
ServeEngineeringSwitzerland +4 · RemoteApr 178
Senior Architect - Server Performance
NVIDIA is seeking architects to drive architectural performance for its next-generation AI server systems. This position demands a unique capability to bridge deep architectural knowledge, workload analysis, and hands-on silicon investigations. Candidates should be adept at working directly with silicon, high-level models, and simulators. Responsibilities include conducting performance investigations on both NVIDIA and competitive platforms, and developing targeted microbenchmarks to examine specific architectural aspects. The role does not heavily involve modeling tasks (functional or performance), though occasional focused assignments may arise.
ServeEngineeringBangalore, India +3Apr 168
Principal Deep Learning Communication Architect
NVIDIA is seeking a Principal Deep Learning Communication Architect to lead the technical roadmap for communication libraries across next-generation platforms, ensuring seamless scaling of models to massive clusters. The role involves designing and optimizing communication primitives for heterogeneous interconnects, co-designing with application developers and silicon architects, and developing analytical models for system behavior. Expertise in parallel computing, HPC/distributed deep learning, inference engines, and GPU architecture is required.
ServeAgentEngineeringSanta Clara, CA +2 · RemoteApr 148
Developer Technology Engineer - AI
NVIDIA Developer Technology Engineer focused on optimizing AI workloads, particularly large language models (LLMs), on NVIDIA's GPU platform. The role involves deep dives into application performance, GPU kernel optimization, distributed training and inference, and collaboration with various internal teams and external developers. It requires strong software engineering skills, parallel programming expertise, and a focus on performance analysis and tuning.
ServePost-trainEngineeringShanghai, China +2Apr 138
AI and FSI Developer Technology Engineer - New College Grad 2026
NVIDIA is seeking an AI and FSI Developer Technology Engineer to optimize AI and HPC workloads on NVIDIA GPUs and CPUs, focusing on performance tuning and eliminating bottlenecks for financial markets. The role involves research, development, analysis, and collaboration with experts to improve performance across the stack, from algorithms to kernels. The engineer will also publish and present their work and influence future hardware/software designs.
ServeEngineeringSanta Clara, CA +3 · RemoteApr 98
Deep Learning Algorithms Engineer - ACOT
NVIDIA is looking for an AI Acceleration & Optimization Engineer to optimize the performance, scalability, and efficiency of AI models (LLMs, VLMs, diffusion, multimodal) on NVIDIA GPU platforms. The role involves profiling, identifying bottlenecks, and applying optimization techniques like quantization and kernel fusion, using tools such as CUDA, TensorRT, and Nsight. Collaboration with various teams (algorithms, systems, hardware, research, CUDA, compiler, frameworks) is key to bringing models from research to production.
ServePost-trainEngineeringHo Chi Minh City, Vietnam +1Apr 48
Senior Machine Learning Applications and Compiler Engineer, LPX
NVIDIA is seeking a Senior Machine Learning Applications and Compiler Engineer to develop algorithms and optimizations for their LPX inference and compiler stack, working at the intersection of large-scale systems, compilers, and deep learning to map neural network workloads onto future NVIDIA platforms.
ServeEngineeringToronto, ON +1 · RemoteApr 48
Senior Machine Learning Applications and Compiler Engineer, LPX
Develops algorithms and optimizations for NVIDIA's LPX inference and compiler stack, focusing on mapping neural network workloads onto future NVIDIA platforms and optimizing end-to-end inference performance. Requires strong software engineering, compiler/runtime development, and deep learning framework experience.
ServeEngineeringSanta Clara, CA +1 · RemoteApr 48
Senior Software Engineer – TensorRT Edge-LLM
Senior Software Engineer to develop and optimize a state-of-the-art inference framework for Large Language, Vision-Language, and Multimodal models on edge and embedded platforms, focusing on real-time performance and constrained environments.
ServeEngineeringSanta Clara, CA +2 · RemoteApr 48
Senior Performance Engineer - Deep Learning
Senior Performance Engineer at NVIDIA focused on optimizing Deep Learning models and frameworks (PyTorch, JAX) for NVIDIA GPUs. The role involves building and supporting Transformer Engine, collaborating on systems research for performance improvements, implementing and benchmarking new DL models, contributing to MLPerf, and engaging with the open-source community and enterprise customers. It also involves influencing future hardware and software design.
ServePost-trainEngineeringSanta Clara, CAApr 48
Senior Software Engineer, Quantized Inference
Senior Software Engineer focused on optimizing quantized inference for LLMs by implementing recipes, developing kernels, and collaborating on inference engines like vLLM and TRT-LLM. The role involves model export pipelines, benchmarking, and data analysis tooling.
ServeEngineeringRedmond, WA +1Apr 48