What you'd actually do

Design and build the core platform that powers an autonomous AI agent — including its reasoning engine, tool orchestration, and the runtime infrastructure it operates on

Develop and evolve the Micro-services ecosystem that gives the agent its capabilities — from knowledge retrieval and log analysis to code execution and workflow automation

Own features end-to-end: from requirements analysis and architecture, through implementation, to production deployment and iteration based on real usage

Instrument, evaluate, and improve the platform's reliability — build observability, track quality, and feed signals back into the system to make the agent more effective over time

Collaborate with engineering teams across the organization to identify high-impact workflows and translate them into AI-assisted automation that boosts developer productivity

Skills

Required

B.Sc. in Computer Science, Computer Engineering, or a related field
5+ years of relevant experience
Solid system-level understanding with experience designing and delivering production services
Ability to architect solutions, guide AI tools effectively, and reason about system behavior end-to-end
Familiarity with containerization and orchestration (Docker, Kubernetes)
Understanding of REST APIs, microservice architectures, and distributed systems
Ability to learn complex concepts in a fast-paced environment

Nice to have

Familiarity with Kubernetes operators, Helm charts, and cluster management
Experience with LLM application development — prompt engineering, agentic frameworks (ReAct, tool-use), or RAG pipelines
Hands-on experience with FastAPI, async Python, or similar modern Python web frameworks
Experience with vector databases, semantic search, or embedding models
Knowledge of OAS (OpenAPI Specification), MCP (Model Context Protocol), and A2A (Agent-to-Agent) protocol ecosystem

What the JD emphasized

core platform

autonomous AI agent

reasoning engine

tool orchestration

runtime infrastructure

Micro-services ecosystem

knowledge retrieval

log analysis

code execution

workflow automation

production deployment

real usage

observability

track quality

agent more effective

high-impact workflows

AI-assisted automation

developer productivity

Python services

Kubernetes infrastructure

data stores

CI/CD pipelines

developer-facing tools

production services

guide AI tools effectively

reason about system behavior end-to-end

containerization and orchestration

Docker

Kubernetes

REST APIs

microservice architectures

distributed systems

complex concepts

fast-paced environment

Kubernetes operators

Helm charts

cluster management

LLM application development

prompt engineering

agentic frameworks

ReAct

tool-use

RAG pipelines

FastAPI

async Python

modern Python web frameworks

vector databases

semantic search

embedding models

OAS

OpenAPI Specification

MCP

Model Context Protocol

A2A

Agent-to-Agent protocol ecosystem

The NVIDIA Networking Advanced Development Software group builds groundbreaking technologies that open new markets and deepen customer relationships. We focus on emerging areas at the intersection of networking and AI — including AI-driven development tools, high-performance networking for AI factories and data centers, and intelligent automation across the software lifecycle. Our work spans the full stack: from application-level analysis and architecture definition down to implementation, leveraging NVIDIA's world-leading networking devices. We collaborate with partners and key customers throughout the process and actively engage with open-source communities.

Within this group, our team is building an AI engineering platform — a cross-domain, distributed and multi-disciplinary systems that enable AI agentic workflows to accelerate engineering development processes, on all stages of the development cycle. The platform combines multiple technologies within the industry standards, introducing new ways to handle complex tasks. Our technological stack based on K8s, AI harness agent, RAG, MCPs, and top tier LLMs. It is designed to drive the next generation of fully autonomous, long living agentic workflows, in scale and best performance in the industry.

What you'll be doing:

Design and build the core platform that powers an autonomous AI agent — including its reasoning engine, tool orchestration, and the runtime infrastructure it operates on
Develop and evolve the Micro-services ecosystem that gives the agent its capabilities — from knowledge retrieval and log analysis to code execution and workflow automation
Own features end-to-end: from requirements analysis and architecture, through implementation, to production deployment and iteration based on real usage
Instrument, evaluate, and improve the platform's reliability — build observability, track quality, and feed signals back into the system to make the agent more effective over time
Collaborate with engineering teams across the organization to identify high-impact workflows and translate them into AI-assisted automation that boosts developer productivity
Work across the stack when the problem requires it — Python services, Kubernetes infrastructure, data stores, CI/CD pipelines, and developer-facing tools

What we need to see:

B.Sc. in Computer Science, Computer Engineering, or a related field
5+ years of relevant experience
Solid system-level understanding with experience designing and delivering production services
Ability to architect solutions, guide AI tools effectively, and reason about system behavior end-to-end
Familiarity with containerization and orchestration (Docker, Kubernetes)
Understanding of REST APIs, microservice architectures, and distributed systems
Ability to learn complex concepts in a fast-paced environment

Ways to stand out from the crowd:

Familiarity with Kubernetes operators, Helm charts, and cluster management
Experience with LLM application development — prompt engineering, agentic frameworks (ReAct, tool-use), or RAG pipelines
Hands-on experience with FastAPI, async Python, or similar modern Python web frameworks
Experience with vector databases, semantic search, or embedding models
Knowledge of OAS (OpenAPI Specification), MCP (Model Context Protocol), and A2A (Agent-to-Agent) protocol ecosystem

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you! NVIDIA is committed to fostering a diverse work environment and is proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.