What you'd actually do

Develop efficient infrastructure and tools for automating complex software processes.

Implement advanced test harnesses, benchmarking frameworks, and analytical tools to rigorously characterize and optimize the performance and efficiency of our software and hardware platforms.

Apply deep knowledge of operating systems, kernel internals, device drivers, memory management, storage, networking, and high-speed interconnects to build and troubleshoot highly performant systems.

Work with engineering teams to understand needs, define requirements, and deliver efficient solutions.

Set performance goals, monitor feedback, analyze data, and make continuous improvements for system reliability.

Skills

Required

C++
Python
Go
operating system internals
device drivers
memory management
distributed systems
networking protocols
cluster management
high-performance interconnects
automation
CI/CD
performance engineering

Nice to have

AI/Machine Learning workloads optimization
inference applications optimization
containerization
Kubernetes
performance profiling tools

What the JD emphasized

5+ years of industry experience in software development, focusing on infrastructure, distributed systems, automation, and/or performance engineering.

Proven ability to develop robust tools and automation using programming languages such as C++, Python, or Go.

Experience with operating system internals, device drivers, memory management, and debugging performance issues in complex compute applications.

Experience in designing, building, and operating large-scale distributed systems, with knowledge of networking protocols, cluster management, and high-performance interconnects.

Experience building and maintaining automated testing, benchmarking, and continuous integration/continuous deployment pipelines.

Our team is building the foundational infrastructure that powers NVIDIA's cutting-edge innovations in AI and high-performance computing. We are seeking a Senior Software Engineer to design, build, and optimize highly scalable and reliable automation systems that ensure the peak performance and seamless deployment of our core software offerings across a diverse ecosystem. This is an opportunity to directly impact how AI models and complex applications are validated, tuned, and delivered globally, from cloud environments to on-premises data centers and specialized hardware.

What you’ll be doing

Develop efficient infrastructure and tools for automating complex software processes.
Drive Performance Optimization: Implement advanced test harnesses, benchmarking frameworks, and analytical tools to rigorously characterize and optimize the performance and efficiency of our software and hardware platforms.
Apply deep knowledge of operating systems, kernel internals, device drivers, memory management, storage, networking, and high-speed interconnects to build and troubleshoot highly performant systems.
Work with engineering teams to understand needs, define requirements, and deliver efficient solutions.
Set performance goals, monitor feedback, analyze data, and make continuous improvements for system reliability.
Influence Technical Strategy: Contribute to defining technical strategies and roadmaps for our platform automation initiatives, ensuring alignment with company-wide goals and standard methodologies.

What we need to see

Bachelor's or equivalent experience in Computer Science, Computer Engineering, or a related technical field, or Master's degree or equivalent experience in a similar field.
5+ years of industry experience in software development, focusing on infrastructure, distributed systems, automation, and/or performance engineering.
Expertise in System-Level Programming: Proven ability to develop robust tools and automation using programming languages such as C++, Python, or Go.
Deep Understanding of System Software: Experience with operating system internals, device drivers, memory management, and debugging performance issues in complex compute applications.
Distributed Systems: Experience in designing, building, and operating large-scale distributed systems, with knowledge of networking protocols, cluster management, and high-performance interconnects.
Automation and CI/CD Proficiency: Experience building and maintaining automated testing, benchmarking, and continuous integration/continuous deployment pipelines.
Problem-Solving and Analytical Skills: Outstanding analytical, problem-solving, and debugging skills, with a track record of resolving complex technical challenges.
Collaboration and Communication: Excellent interpersonal and communication skills, with the ability to articulate complex technical concepts to diverse audiences and collaborate effectively across teams.

Ways to stand out from the crowd

Experience optimizing performance for AI/Machine Learning workloads, especially inference applications, on diverse hardware platforms.
Prior experience building or contributing to large-scale compute infrastructure solutions in cloud environments or on-premises data centers.
Experience with containerization and orchestration technologies, such as Docker and Kubernetes.
Familiarity with performance profiling tools and methodologies for hardware and software systems.
Track record of driving significant efficiency gains or architectural improvements in large-scale systems.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

What you’ll be doing

Develop efficient infrastructure and tools for automating complex software processes.

Drive Performance Optimization: Implement advanced test harnesses, benchmarking frameworks, and analytical tools to rigorously characterize and optimize the performance and efficiency of our software and hardware platforms.

Apply deep knowledge of operating systems, kernel internals, device drivers, memory management, storage, networking, and high-speed interconnects to build and troubleshoot highly performant systems.

Work with engineering teams to understand needs, define requirements, and deliver efficient solutions.

Set performance goals, monitor feedback, analyze data, and make continuous improvements for system reliability.

Influence Technical Strategy: Contribute to defining technical strategies and roadmaps for our platform automation initiatives, ensuring alignment with company-wide goals and standard methodologies.

What we need to see

Bachelor's or equivalent experience in Computer Science, Computer Engineering, or a related technical field, or Master's degree or equivalent experience in a similar field.

5+ years of industry experience in software development, focusing on infrastructure, distributed systems, automation, and/or performance engineering.

Expertise in System-Level Programming: Proven ability to develop robust tools and automation using programming languages such as C++, Python, or Go.

Deep Understanding of System Software: Experience with operating system internals, device drivers, memory management, and debugging performance issues in complex compute applications.

Distributed Systems: Experience in designing, building, and operating large-scale distributed systems, with knowledge of networking protocols, cluster management, and high-performance interconnects.

Automation and CI/CD Proficiency: Experience building and maintaining automated testing, benchmarking, and continuous integration/continuous deployment pipelines.

Problem-Solving and Analytical Skills: Outstanding analytical, problem-solving, and debugging skills, with a track record of resolving complex technical challenges.

Collaboration and Communication: Excellent interpersonal and communication skills, with the ability to articulate complex technical concepts to diverse audiences and collaborate effectively across teams.

Ways to stand out from the crowd

Experience optimizing performance for AI/Machine Learning workloads, especially inference applications, on diverse hardware platforms.

Prior experience building or contributing to large-scale compute infrastructure solutions in cloud environments or on-premises data centers.

Experience with containerization and orchestration technologies, such as Docker and Kubernetes.

Familiarity with performance profiling tools and methodologies for hardware and software systems.

Track record of driving significant efficiency gains or architectural improvements in large-scale systems.

Senior System Software Engineer - AI Data Platform - Inference Factory Optimization

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

What you’ll be doing

What we need to see

Ways to stand out from the crowd

What you’ll be doing

What we need to see

Ways to stand out from the crowd