Senior Architect, AI Solutions Engineering

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

This role focuses on architecting and scaling internal AI solutions for NVIDIA's cloud infrastructure, managing tools for AI development, identifying performance bottlenecks, and guiding engineers in solving complex AI-related problems. The role involves a mix of building AI agentic workflows and optimizing the underlying inference infrastructure.

What you'd actually do

  1. Serve as an Architect developing internal AI systems used by thousands of NVIDIANs globally.
  2. Identify gaps and issues and resolve ones are better suited for AI solutions versus conventional approaches.
  3. Further divide the AI category into 'buy versus build' options by researching available tools in the market.
  4. Align with teams across Nvidia to establish overall AI system goals and break them down into specific objectives for each sub-system.
  5. Drive, motivate, convince, and mentor sub-system leads to achieve improvements with agility and speed.

Skills

Required

  • BS EE/CS or equivalent experience
  • 12+ years of systems software development
  • 1+ year of experience in developing/exploring AI
  • Development with Large Language Models (LLMs)
  • Retrieval-Augmented Generation (RAG)
  • Fine-Tuning LLMs
  • AI Agentic workflows
  • LangChain
  • LangGraphs
  • Cascading models
  • Deploying in hybrid, multi-cloud architecture
  • Edge computing
  • Architecting and shipping large-scale distributed software systems
  • Identify gaps and bottlenecks, and develop solutions to optimize performance
  • Programming and software development skills in JAVA, Python, Shell-script
  • Understanding of distributed systems
  • Understanding of REST APIs
  • Experience in working with SQL/NoSQL database systems (MySQL, Cassandra, MongoDB, Elasticsearch)
  • Experience with Docker containers
  • Experience with Virtual Machines
  • Cloud technologies (OpenStack, Docker, Kubernetes, Chef/Puppet, Hadoop/Ceph/SwiftStack, LXC, Git, Perforce, JFrog, Kafka)

Nice to have

  • MS or PhD in EE/CS
  • Depth in AI, Machine Learning and Deep Learning algorithms and techniques
  • Designing high-performance, scalable software systems with a strong focus on hardware cost optimization

What the JD emphasized

  • BS EE/CS or equivalent experience with 12+ years of systems software development with at least 1 year of experience in developing/exploring AI.
  • Development with Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), Fine-Tuning LLMs, AI Agentic workflows, LangChain, LangGraphs, and Cascading models.
  • Extensive experience architecting and shipping large-scale distributed software systems.
  • Ability to identify gaps and bottlenecks, and develop solutions to optimize performance.

Other signals

  • AI systems for internal cloud infrastructure
  • Scale-up of key AI solutions
  • Identify gaps and bottlenecks in AI development and testing systems
  • Develop and deploy AI solutions