Senior LLM Agents Architect

NVIDIA · Semiconductors · Yokneam, Israel +1

Senior LLM Agents Architect at NVIDIA to build and deploy agentic systems integrating LLMs with domain tools for HW/SW engineering workflows. Focus on developing end-to-end agent flows for simulation analysis, kernel optimization, and developer efficiency, including prototyping, integration, evaluation, and mentoring.

What you'd actually do

Develop innovative AI flows to improve hardware and software through collaboration with various engineering roles.
Facilitate co-creation workshops to transform SME rules of thumb into specific assignments, resources, cues, and guidelines. Define success measures, evaluation data, and feedback loops for agents to make a measurable difference in NVIDIA's HW simulation analysis and GPU-kernel optimization workflows.
Rapidly prototype and thoughtfully productize; integrate with internal services, utilize GPU capabilities, remove bottlenecks, and deliver fitting solutions.
Set up evaluation backbone using offline golden sets and online telemetry for confident iterations, cost control, and safe improvements.
Mentor and improve teams through insights in agent orchestration, prompting, RAG, observability, crafting documentation and playbooks for NVIDIA's teams.

Skills

Required

7+ years in applied ML/AI or large-scale systems, with 3+ years crafting agentic or LLM-powered applications in production environments.
B.Sc in Computer Science/ Electrical Engineering
Proven ownership of at least one end-to-end agentic system or LLM application: requirements, architecture, implementation, evaluation, and incremental hardening in production.
Strong software engineering skills in Python and one systems language (C++ or Rust preferred); experience integrating with GPUs, CUDA, or performance-critical services.
Proficient in PyTorch or TensorFlow; skilled in tool use, RAG pipelines, and model adaptation.
Demonstrated ability to collaborate with HW/SW domain experts and translate their heuristics into deterministic tools, constraints, and evaluation metrics.
Excellence in communication and facilitation: aligning diverse collaborators, documenting decisions/assumptions, and influencing without authority.
Track record of building observability for AI systems: dataset/version management, offline test suites, online telemetry, guardrails/safety checks, and rollback plans.
Proactive, independent, possessing strong analytical and problem-solving abilities; adept at handling uncertainty to provide practical, gradual benefits.

What the JD emphasized

Proven ownership of at least one end-to-end agentic system or LLM application: requirements, architecture, implementation, evaluation, and incremental hardening in production.
Track record of building observability for AI systems: dataset/version management, offline test suites, online telemetry, guardrails/safety checks, and rollback plans.

Other signals

Develop innovative AI flows to improve hardware and software through collaboration with various engineering roles.
Define success measures, evaluation data, and feedback loops for agents to make a measurable difference in NVIDIA's HW simulation analysis and GPU-kernel optimization workflows.
Rapidly prototype and thoughtfully productize; integrate with internal services, utilize GPU capabilities, remove bottlenecks, and deliver fitting solutions.
Set up evaluation backbone using offline golden sets and online telemetry for confident iterations, cost control, and safe improvements.
Mentor and improve teams through insights in agent orchestration, prompting, RAG, observability, crafting documentation and playbooks for NVIDIA's teams.

Read full job description

Our team propels generative AI forward by building and deploying agentic systems that integrate innovative LLMs with domain tools to expedite HW and SW engineering workflows at scale. We are in search of a top-tier AI Agents Solution Architect to work closely with hardware architects, verification engineers, GPU performance experts, and software developers to develop end-to-end agent flows that drive significant enhancements in simulation analysis, kernel optimization, and developer efficiency.

What you'll be doing:

Develop innovative AI flows to improve hardware and software through collaboration with various engineering roles.
Facilitate co-creation workshops to transform SME rules of thumb into specific assignments, resources, cues, and guidelines. Define success measures, evaluation data, and feedback loops for agents to make a measurable difference in NVIDIA's HW simulation analysis and GPU-kernel optimization workflows.
Rapidly prototype and thoughtfully productize; integrate with internal services, utilize GPU capabilities, remove bottlenecks, and deliver fitting solutions.
Set up evaluation backbone using offline golden sets and online telemetry for confident iterations, cost control, and safe improvements.
Mentor and improve teams through insights in agent orchestration, prompting, RAG, observability, crafting documentation and playbooks for NVIDIA's teams.

What we need to see:

7+ years in applied ML/AI or large-scale systems, with 3+ years crafting agentic or LLM-powered applications in production environments.
B.Sc in Computer Science/ Electrical Engineering
Proven ownership of at least one end-to-end agentic system or LLM application: requirements, architecture, implementation, evaluation, and incremental hardening in production.
Strong software engineering skills in Python and one systems language (C++ or Rust preferred); experience integrating with GPUs, CUDA, or performance-critical services.
Proficient in PyTorch or TensorFlow; skilled in tool use, RAG pipelines, and model adaptation.
Demonstrated ability to collaborate with HW/SW domain experts and translate their heuristics into deterministic tools, constraints, and evaluation metrics.
Excellence in communication and facilitation: aligning diverse collaborators, documenting decisions/assumptions, and influencing without authority.
Track record of building observability for AI systems: dataset/version management, offline test suites, online telemetry, guardrails/safety checks, and rollback plans.
Proactive, independent, possessing strong analytical and problem-solving abilities; adept at handling uncertainty to provide practical, gradual benefits.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com.

#LI-Hybrid

What you'll be doing:

Develop innovative AI flows to improve hardware and software through collaboration with various engineering roles.
Facilitate co-creation workshops to transform SME rules of thumb into specific assignments, resources, cues, and guidelines. Define success measures, evaluation data, and feedback loops for agents to make a measurable difference in NVIDIA's HW simulation analysis and GPU-kernel optimization workflows.
Rapidly prototype and thoughtfully productize; integrate with internal services, utilize GPU capabilities, remove bottlenecks, and deliver fitting solutions.
Set up evaluation backbone using offline golden sets and online telemetry for confident iterations, cost control, and safe improvements.
Mentor and improve teams through insights in agent orchestration, prompting, RAG, observability, crafting documentation and playbooks for NVIDIA's teams.

What we need to see:

7+ years in applied ML/AI or large-scale systems, with 3+ years crafting agentic or LLM-powered applications in production environments.
B.Sc in Computer Science/ Electrical Engineering
Proven ownership of at least one end-to-end agentic system or LLM application: requirements, architecture, implementation, evaluation, and incremental hardening in production.
Strong software engineering skills in Python and one systems language (C++ or Rust preferred); experience integrating with GPUs, CUDA, or performance-critical services.
Proficient in PyTorch or TensorFlow; skilled in tool use, RAG pipelines, and model adaptation.
Demonstrated ability to collaborate with HW/SW domain experts and translate their heuristics into deterministic tools, constraints, and evaluation metrics.
Excellence in communication and facilitation: aligning diverse collaborators, documenting decisions/assumptions, and influencing without authority.
Track record of building observability for AI systems: dataset/version management, offline test suites, online telemetry, guardrails/safety checks, and rollback plans.
Proactive, independent, possessing strong analytical and problem-solving abilities; adept at handling uncertainty to provide practical, gradual benefits.

#LI-Hybrid