Senior AI Infrastructure Software Engineer

NVIDIA NVIDIA · Semiconductors · Shanghai, China

Senior AI Infrastructure Software Engineer at NVIDIA, focusing on building and scaling infrastructure for AI agents and applications in chip design. The role involves designing, developing, and improving scalable infrastructure, driving performance and reliability improvements, and collaborating with research and hardware teams. Requires expertise in Python, distributed systems, microservices, and integrating LLMs/agent frameworks.

What you'd actually do

  1. Design, develop, and improve scalable infrastructure to support the next generation of AI applications, including copilots and agentic tools.
  2. Drive improvements in architecture, performance, and reliability, enabling teams to bring to bear LLMs and advanced agent frameworks at scale.
  3. Collaborate across hardware, software, and research teams, mentoring and supporting peers while encouraging best engineering practices and a culture of technical excellence.
  4. Stay informed of the latest advancements in AI infrastructure and contribute to continuous innovation across the organization.

Skills

Required

  • Python
  • large-scale distributed systems
  • AI infrastructure
  • software engineering principles
  • OOP/functional programming
  • high-performance, maintainable code
  • scalable microservices
  • web apps
  • SQL
  • NoSQL databases
  • MongoDB
  • Redis
  • containers
  • Kubernetes
  • CI/CD
  • distributed messaging systems
  • Kafka
  • event-driven architectures
  • decoupled architectures
  • integrating and fine-tuning LLMs
  • agent frameworks
  • LangChain
  • LangGraph
  • AutoGen
  • OpenAI Functions
  • RAG
  • vector databases
  • timely engineering
  • end-to-end ownership
  • architecture
  • development
  • deployment
  • integration
  • ongoing operations/support
  • communication skills

Nice to have

  • JavaScript

What the JD emphasized

  • minimum of 5 years in large-scale distributed systems or AI infrastructure
  • Advanced expertise in Python (required)
  • deep knowledge of software engineering principles
  • Demonstrated expertise in crafting scalable microservices
  • Solid experience with distributed messaging systems
  • Practical experience integrating and fine-tuning LLMs or agent frameworks
  • Demonstrated end-to-end ownership of engineering solutions

Other signals

  • building and maintaining the core infrastructure for deploying and running these agents in production
  • integrating and fine-tuning LLMs or agent frameworks
  • scalable infrastructure to support the next generation of AI applications, including copilots and agentic tools