Principal Engineer, AI Tooling and Workflows

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Principal Engineer to lead the technical vision, architecture, and execution for AI-native developer tooling and workflow automation platforms. The role involves inventing and developing production-grade autonomous AI systems that reason over engineering workflows, driving the evolution of AI-assisted processes in software development, and defining platform-level standards for LLM-powered systems.

What you'd actually do

  1. Lead the technical vision, architecture, and execution for AI-native developer tooling and workflow automation platforms used across NVIDIA engineering.
  2. Invent and develop production-grade autonomous AI systems that can reason over engineering workflows - code, documentation, CI/CD pipelines.
  3. Drive the evolution of AI-assisted processes in software development, including code understanding, requirements traceability, validation, tests, build and release automation, security review.
  4. Define platform-level standards for reliability, evaluation, observability, safety, security, latency, cost efficiency, and human-in-the-loop controls for LLM-powered systems.
  5. Partner with engineering leaders, teams across products, infrastructure, security, and research to identify high-leverage opportunities and deliver solutions with broad impact.

Skills

Required

  • PhD or MS or equivalent experience in Computer Science, Computer Engineering, Electrical Engineering, or a related field, or equivalent experience
  • 15+ years of software engineering experience
  • Experience in large-scale platforms, distributed systems, AI systems, or developer infrastructure used by demanding engineering teams
  • Deep hands-on expertise with LLM applications, agentic workflows, RAG, embeddings, vector search, tool use, prompt engineering, model evaluation, and AI system safety
  • Exceptional architecture judgment across APIs, services, data pipelines, Kubernetes, observability, reliability engineering, security, and production operations
  • Strong coding ability in Python and at least one major production language such as C++, Go or Rust, with the judgment to build simple systems that scale
  • Technical leadership at Principal level: setting direction, aligning collaborators, guiding senior engineers, and raising the engineering bar across boundaries

Nice to have

  • Built AI tools, copilots, or autonomous agents that materially changed how large engineering organizations build, validate, or operate software
  • Understanding of the full stack of enterprise AI systems: MCPs, tool-using agents, skills, retrieval, knowledge graphs, fine-tuning, model serving, evaluation, governance
  • Optimizations in AI platforms for real-world scale, including latency, throughput, cost, GPU acceleration, TensorRT, Triton, quantization, batching, caching, or model routing
  • Domain depth in GPU computing, drivers, compilers, embedded systems, robotics, autonomous vehicles, or other hardware-software environments
  • Spotting step-function productivity opportunities and turning them into efficient platforms that engineers love and leaders trust

What the JD emphasized

  • 15+ years of software engineering experience
  • Deep hands-on expertise with LLM applications, agentic workflows, RAG, embeddings, vector search, tool use, prompt engineering, model evaluation, and AI system safety.
  • Exceptional architecture judgment across APIs, services, data pipelines, Kubernetes, observability, reliability engineering, security, and production operations.
  • Technical leadership at Principal level: setting direction, aligning collaborators, guiding senior engineers, and raising the engineering bar across boundaries.

Other signals

  • AI-native developer tooling
  • agentic AI systems
  • intelligent workflow automation
  • large-scale platforms
  • LLM applications