Senior System Software Engineer, AI Data Platform

NVIDIA NVIDIA · Semiconductors · Hanoi, Vietnam +1 · Remote

Senior Software Engineer role focused on building and optimizing foundational infrastructure for NVIDIA's AI and high-performance computing innovations. The role involves designing, building, and optimizing scalable automation systems for the performance, tuning, and deployment of core software offerings, impacting AI models and complex applications across various environments.

What you'd actually do

  1. Develop efficient infrastructure and tools for automating complex software processes.
  2. Drive Performance Optimization: Implement advanced test harnesses, benchmarking frameworks, and analytical tools to rigorously characterize and optimize the performance and efficiency of our software and hardware platforms.
  3. Apply deep knowledge of operating systems, kernel internals, device drivers, memory management, storage, networking, and high-speed interconnects to build and troubleshoot highly performant systems.
  4. Work with engineering teams to understand needs, define requirements, and deliver efficient solutions.
  5. Set performance goals, monitor feedback, analyze data, and make continuous improvements for system reliability.

Skills

Required

  • C++
  • Python
  • Go
  • operating system internals
  • device drivers
  • memory management
  • distributed systems
  • networking protocols
  • cluster management
  • high-performance interconnects
  • automated testing
  • benchmarking
  • continuous integration/continuous deployment (CI/CD) pipelines
  • analytical skills
  • problem-solving skills
  • debugging skills
  • collaboration
  • communication skills

Nice to have

  • performance optimization for AI/ML workloads
  • inference applications
  • containerization
  • orchestration technologies
  • Docker
  • Kubernetes
  • performance profiling tools
  • hardware and software systems
  • architectural improvements

What the JD emphasized

  • 5+ years of industry experience in software development, focusing on infrastructure, distributed systems, automation, and/or performance engineering.
  • Expertise in System-Level Programming
  • Deep Understanding of System Software
  • Distributed Systems
  • Automation and CI/CD Proficiency
  • Problem-Solving and Analytical Skills
  • Experience optimizing performance for AI/Machine Learning workloads, especially inference applications, on diverse hardware platforms.

Other signals

  • building foundational infrastructure for AI
  • optimizing performance of AI models and applications
  • automating deployment of core software offerings