What you'd actually do

Develop and deploy multi-agent systems for automated test generation, log analysis, failure triage, and bug-filing workflows

Build and maintain agent orchestration frameworks using tools such as Claude Code, MCP servers, and agent SDK patterns

Create autonomous pipelines that reduce cognitive load on engineers by routing failures, surfacing root causes, and generating actionable bug reports

Build evaluation systems to measure agent output quality — ensuring autonomous pipelines are reliable, not just fast

Establish observability and monitoring for agentic workflows so failures are transparent, debug-gable, and recoverable

Skills

Required

Python engineering
AI-native development workflows
Claude Code
Cursor
LLM APIs
prompt engineering in production
building multi-agent or autonomous systems
understanding where LLMs fail
building mitigations into system design
evaluation frameworks for AI-generated outputs
test automation
CI/CD pipeline design
software quality engineering
failure analysis
test triage at scale
reason about test coverage strategically

Nice to have

MCP servers
agent SDK patterns
NVIDIA Omniverse
OpenUSD
complex platform SDKs
custom tool integrations
graceful failure recovery
retry logic
fallback chains
human-in-the-loop escalation

What the JD emphasized

high-agency engineers

autonomous agents

agentic infrastructure

multi-agent systems

agent orchestration frameworks

autonomous pipelines

evaluation systems

agent output quality

observability and monitoring for agentic workflows

AI-native development workflows

multi-agent or autonomous systems that have shipped and run without continuous supervision

where LLMs fail — hallucination, context degradation, tool misuse — and experience building mitigations into system design, including evaluation frameworks for AI-generated outputs

Built and shipped MCP servers, custom tool integrations, or multi-agent orchestrations that extend LLM capabilities in production

Designed evaluation harnesses or scoring systems that measure and enforce LLM output quality at scale

built agentic systems with graceful failure recovery

What does it look like to build infrastructure that thinks — that triages failures, files bugs, and surfaces root causes without waiting for a human to ask? What if the tools we build today become the foundation for how the whole industry does software quality tomorrow? We are building an engineering team where a small group of high-agency engineers, equipped with well-designed autonomous agents, can accomplish what previously required a much larger organisation. This role sits at the centre of that transformation. You will design and build the agentic infrastructure that powers our test automation and quality engineering workflows for the NVIDIA Omniverse platform. This isn’t about using AI tools to work faster — it’s about building the infrastructure that other engineers depend on to ship high-quality software with greater speed and confidence!

What you’ll be doing:

As a Senior Tools Development Engineer on our team, you will own end-to-end outcomes, work with significant autonomy, and have a direct impact on the reliability of one of NVIDIA's most strategic developer platforms. In this role you can expect to:

Build Agentic Test Pipelines

Develop and deploy multi-agent systems for automated test generation, log analysis, failure triage, and bug-filing workflows
Build and maintain agent orchestration frameworks using tools such as Claude Code, MCP servers, and agent SDK patterns
Create autonomous pipelines that reduce cognitive load on engineers by routing failures, surfacing root causes, and generating actionable bug reports

Own Infrastructure Quality

Build evaluation systems to measure agent output quality — ensuring autonomous pipelines are reliable, not just fast
Establish observability and monitoring for agentic workflows so failures are transparent, debug-gable, and recoverable

Drive Team Adoption

Build internal tooling that is adoptable, not just technically impressive — with clear documentation and low onboarding friction
Collaborate with the broader QA team to identify automation opportunities and build the tools that accelerate them

What we need to see:

Core Technical Skills

Strong Python engineering — clean, testable, maintainable code with a systems-level perspective
Deep familiarity with AI-native development workflows — Claude Code, Cursor, LLM APIs, prompt engineering in production
Hands-on experience building multi-agent or autonomous systems that have shipped and run without continuous supervision
Clear understanding of where LLMs fail — hallucination, context degradation, tool misuse — and experience building mitigations into system design, including evaluation frameworks for AI-generated outputs

Test & Quality Engineering Foundation

A graduate degree in Computer Science Engineering or equivalent
5+ years in test automation, CI/CD pipeline design, or software quality engineering, including failure analysis and test triage at scale
Ability to reason about test coverage strategically across a complex, frequently-releasing platform SDK

Mindset & Working Style

High agency — owns outcomes end-to-end, defines their own path in ambiguous problem spaces
The patience and communication skill to build systems that colleagues can trust and adopt
Intellectual honesty about where systems break, with a habit of building in recovery paths rather than hiding failures

Ways to stand out from the crowd:

Built and shipped MCP servers, custom tool integrations, or multi-agent orchestrations that extend LLM capabilities in production — with working examples to show
Designed evaluation harnesses or scoring systems that measure and enforce LLM output quality at scale, not just in prototypes
You've built agentic systems with graceful failure recovery — retry logic, fallback chains, human-in-the-loop escalation — rather than silent breakages
You have experience with NVIDIA Omniverse, OpenUSD, or similarly complex platform SDKs, and can reason about test strategy across them
You can point to infrastructure you've shipped that measurably reduced a team's manual triage or debugging burden — with clear documentation that let others extend your work without you in the room

With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. Due to outstanding growth, our elite engineering teams are rapidly growing. If you're creative with a real passion for technology, we want to hear from you. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform crucial job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.