Senior Software Engineer, Applied AI

NVIDIA · Semiconductors · Munich, Germany

Senior Software Engineer, Applied AI Systems role focused on building production AI/ML and agentic solutions. Responsibilities include developing agents, workflow services, APIs, data pipelines, tool integrations, evaluation harnesses, and operational tooling. Requires strong Python skills, experience with LLMs, RAG, agentic AI, distributed systems, and system design. The role emphasizes turning ambiguous problems into durable software systems and shaping how production applied AI systems are built and measured.

What you'd actually do

Build and own production-grade applied AI systems for NVIDIA’s technical and solution development use cases, including agentic solutions where they materially improve the systems and softwares.
Design and build agentic workflows and the software around them: workflow services, APIs, retrieval, MCP/A2A-style tool integrations, agent harnesses, automation, telemetry, operational controls, and human oversight.
Design reliable services, APIs, workflow state, event-driven execution, and observability using systems such as Kafka, ClickHouse, and OTel-style patterns.
Translate complex technical and operational requirements into clear system designs, plans, interfaces, measurable outcomes, and pragmatic technical decisions through design reviews, code reviews, and clear communication.
Develop production software in Python and other relevant languages, with strong testing, observability, CI/CD, documentation, and operational practices.

Skills

Required

BS, MS, or PhD in Computer Science, Engineering, AI/ML, or equivalent experience
5+ years of professional software engineering experience owning production systems or meaningful platform components
Hands-on experience with LLM, generative AI, RAG, agentic AI, MCP or intelligent AI technologies beyond simple prompting or notebooks, including tool use, retrieval, evaluation, guardrails, orchestration, or human-in-the-loop control
Strong Python engineering skills
practical experience with at least one additional production programming language such as C++, Go, Rust, or TypeScript
Demonstrated ability to develop and build distributed systems, backend services, data pipelines, workflow orchestration, APIs, or developer platforms using production environments like Kafka, ClickHouse, PostgreSQL, Redis, object storage, Kubernetes, or similar technologies
Strong system design and operational judgment, including reliability, latency, cost, security, privacy, scalability, debuggability, maintainability, performance analysis, benchmarking, profiling, or capacity evaluation
Excellent debugging and problem-solving skills across software, infrastructure, AI systems, and performance bottlenecks
Proven ownership of ambiguous, cross-team engineering work, with ability to collaborate with distributed teams spanning US Pacific, EMEA, and APAC timezones
Strong written and verbal communication skills in English

Nice to have

Experience building real-world AI implementations, agent tools, MCP-compatible modules, A2A-style bridges, agent frameworks, evaluation frameworks, or RAG systems used by real users
Familiarity with NVIDIA GPU, AI Software Technologies such as NVIDIA NIM, NeMo Agent Toolkit, CUDA and Agentic AI development frameworks
Open-source contributions, technical papers, patents, conference talks, engineering blogs, or major internal engineering artifacts

What the JD emphasized

production AI / ML and agentic solutions
hands-on senior engineer
turn ambiguous technical problems into durable software systems
build AI systems as real software systems
shape how production applied AI systems are built, measured, and reused
focus on reusable software capability rather than one-off delivery
drives execution across teams
production-grade applied AI systems
agentic solutions
agentic workflows
tool integrations
human oversight
reliable services
event-driven execution
observability
production software
strong testing
observability
CI/CD
documentation
operational practices
performance and benchmarking workflows
validation harnesses
regression tests
tracing
metrics
failure analysis
latency
throughput
reliability
resource usage
AI/inference behavior
standard solution patterns
codify repeated patterns
product gaps
field lessons
APIs
services
reference architectures
playbooks
test harnesses
shared engineering building blocks
debug and support production solutions
software
infrastructure
AI models
data pipelines
inference services
GPU-accelerated environments
recurring support patterns
product or platform improvements
5+ years of professional software engineering experience owning production systems or meaningful platform components
Hands-on experience with LLM, generative AI, RAG, agentic AI, MCP or intelligent AI technologies beyond simple prompting or notebooks
tool use
retrieval
evaluation
guardrails
orchestration
human-in-the-loop control
Strong Python engineering skills
practical experience with at least one additional production programming language
Demonstrated ability to develop and build distributed systems
backend services
data pipelines
workflow orchestration
APIs
developer platforms
production environments
Strong system design and operational judgment
reliability
latency
cost
security
privacy
scalability
debuggability
maintainability
performance analysis
benchmarking
profiling
capacity evaluation
Excellent debugging and problem-solving skills
software
infrastructure
AI systems
performance bottlenecks
Proven ownership of ambiguous, cross-team engineering work
collaborate with distributed teams
Required : Strong written and verbal communication skills in English
Experience building real-world AI implementations
agent tools
MCP-compatible modules
A2A-style bridges
agent frameworks
evaluation frameworks
RAG systems used by real users

Other signals

building production AI systems
agentic workflows
software engineering
distributed systems
performance engineering

Read full job description

We are looking for a Senior Software Engineer, Applied AI Systems, to build production AI / ML and agentic solutions. We need a hands-on senior engineer who can turn ambiguous technical problems into durable software systems and AI-enabled systems: agents, workflow services, APIs, data pipelines, tool integrations, evaluation and benchmarking harnesses, reference architectures, and operational tooling.

We work at the intersection of applied AI, agentic workflows, software engineering, distributed systems, performance engineering, accelerated computing, and data infrastructure. In this role, you will build AI systems as real software systems: write and review high-quality code, make architecture tradeoffs, benchmark behavior and performance, and outcomes from prototype through validation, hardening, deployment, and ongoing support. This is an opportunity to shape how production applied AI systems are built, measured, and reused inside NVIDIA!

We partner across global teams and time zones for design reviews, planning, debugging, support critical issues, and technical decision-making. We need an engineer who turns complex requirements into clear technical plans, keeps the focus on reusable software capability rather than one-off delivery, and drives execution across teams.

**What you will be doing: **

Build and own production-grade applied AI systems for NVIDIA’s technical and solution development use cases, including agentic solutions where they materially improve the systems and softwares.
Design and build agentic workflows and the software around them: workflow services, APIs, retrieval, MCP/A2A-style tool integrations, agent harnesses, automation, telemetry, operational controls, and human oversight.
Design reliable services, APIs, workflow state, event-driven execution, and observability using systems such as Kafka, ClickHouse, and OTel-style patterns.
Translate complex technical and operational requirements into clear system designs, plans, interfaces, measurable outcomes, and pragmatic technical decisions through design reviews, code reviews, and clear communication.
Develop production software in Python and other relevant languages, with strong testing, observability, CI/CD, documentation, and operational practices.
Build performance and benchmarking workflows for existing production solutions or products, including validation harnesses, regression tests, tracing, metrics, failure analysis, latency, throughput, reliability, resource usage, and AI/inference behavior where relevant.
Improve standard solution patterns alongside larger applied AI systems, working with NVIDIA engineering and solution teams to codify repeated patterns, product gaps, and field lessons into APIs, services, reference architectures, playbooks, test harnesses, and shared engineering building blocks.
Debug and support production solutions across software, infrastructure, AI models, data pipelines, inference services, and GPU-accelerated environments, turning recurring support patterns into product or platform improvements.

What we need to see:

BS, MS, or PhD in Computer Science, Engineering, AI/ML, or equivalent experience, with 5+ years of professional software engineering experience owning production systems or meaningful platform components.
Hands-on experience with LLM, generative AI, RAG, agentic AI, MCP or intelligent AI technologies beyond simple prompting or notebooks, including tool use, retrieval, evaluation, guardrails, orchestration, or human-in-the-loop control.
Strong Python engineering skills and practical experience with at least one additional production programming language such as C++, Go, Rust, or TypeScript.
Demonstrated ability to develop and build distributed systems, backend services, data pipelines, workflow orchestration, APIs, or developer platforms using production environments like Kafka, ClickHouse, PostgreSQL, Redis, object storage, Kubernetes, or similar technologies.
Strong system design and operational judgment, including reliability, latency, cost, security, privacy, scalability, debuggability, maintainability, performance analysis, benchmarking, profiling, or capacity evaluation.
Excellent debugging and problem-solving skills across software, infrastructure, AI systems, and performance bottlenecks.
Proven ownership of ambiguous, cross-team engineering work, with ability to collaborate with distributed teams spanning US Pacific, EMEA, and APAC timezones.
Required : Strong written and verbal communication skills in English.

**Ways to stand out from the crowd: **

Experience building real-world AI implementations, agent tools, MCP-compatible modules, A2A-style bridges, agent frameworks, evaluation frameworks, or RAG systems used by real users.
Familiarity with NVIDIA GPU, AI Software Technologies such as NVIDIA NIM, NeMo Agent Toolkit, CUDA and Agentic AI development frameworks
Open-source contributions, technical papers, patents, conference talks, engineering blogs, or major internal engineering artifacts

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/

**What you will be doing: **

Build and own production-grade applied AI systems for NVIDIA’s technical and solution development use cases, including agentic solutions where they materially improve the systems and softwares.
Design and build agentic workflows and the software around them: workflow services, APIs, retrieval, MCP/A2A-style tool integrations, agent harnesses, automation, telemetry, operational controls, and human oversight.
Design reliable services, APIs, workflow state, event-driven execution, and observability using systems such as Kafka, ClickHouse, and OTel-style patterns.
Translate complex technical and operational requirements into clear system designs, plans, interfaces, measurable outcomes, and pragmatic technical decisions through design reviews, code reviews, and clear communication.
Develop production software in Python and other relevant languages, with strong testing, observability, CI/CD, documentation, and operational practices.
Build performance and benchmarking workflows for existing production solutions or products, including validation harnesses, regression tests, tracing, metrics, failure analysis, latency, throughput, reliability, resource usage, and AI/inference behavior where relevant.
Improve standard solution patterns alongside larger applied AI systems, working with NVIDIA engineering and solution teams to codify repeated patterns, product gaps, and field lessons into APIs, services, reference architectures, playbooks, test harnesses, and shared engineering building blocks.
Debug and support production solutions across software, infrastructure, AI models, data pipelines, inference services, and GPU-accelerated environments, turning recurring support patterns into product or platform improvements.

What we need to see:

BS, MS, or PhD in Computer Science, Engineering, AI/ML, or equivalent experience, with 5+ years of professional software engineering experience owning production systems or meaningful platform components.
Hands-on experience with LLM, generative AI, RAG, agentic AI, MCP or intelligent AI technologies beyond simple prompting or notebooks, including tool use, retrieval, evaluation, guardrails, orchestration, or human-in-the-loop control.
Strong Python engineering skills and practical experience with at least one additional production programming language such as C++, Go, Rust, or TypeScript.
Demonstrated ability to develop and build distributed systems, backend services, data pipelines, workflow orchestration, APIs, or developer platforms using production environments like Kafka, ClickHouse, PostgreSQL, Redis, object storage, Kubernetes, or similar technologies.
Strong system design and operational judgment, including reliability, latency, cost, security, privacy, scalability, debuggability, maintainability, performance analysis, benchmarking, profiling, or capacity evaluation.
Excellent debugging and problem-solving skills across software, infrastructure, AI systems, and performance bottlenecks.
Proven ownership of ambiguous, cross-team engineering work, with ability to collaborate with distributed teams spanning US Pacific, EMEA, and APAC timezones.
Required : Strong written and verbal communication skills in English.

**Ways to stand out from the crowd: **

Experience building real-world AI implementations, agent tools, MCP-compatible modules, A2A-style bridges, agent frameworks, evaluation frameworks, or RAG systems used by real users.
Familiarity with NVIDIA GPU, AI Software Technologies such as NVIDIA NIM, NeMo Agent Toolkit, CUDA and Agentic AI development frameworks
Open-source contributions, technical papers, patents, conference talks, engineering blogs, or major internal engineering artifacts