What you'd actually do

Develop and advance speech, audio, and multimodal LLMs that power next-generation voice-first agentic AI experiences.

Leverage AI-native and agentic workflows to accelerate research, development, evaluation, and deployment of AI systems.

Design and deploy AI models and platforms with strong consideration for AI safety, security, privacy, and Sovereign AI requirements.

Establish and drive benchmarking frameworks for voice agents, including speech quality, reasoning, tool use, latency, reliability, and user experience.

Lead technical initiatives, mentor engineers, and foster a One Team culture through close collaboration across research, engineering, product, and customer teams.

Skills

Required

Master's degree in Computer Science, AI, Electrical Engineering, or related field (or equivalent experience)
2+ years of experience building and deploying speech, multimodal AI, or LLM systems
Strong Python skills
Experience with PyTorch or TensorFlow
Hands-on experience with ASR, TTS, speech understanding, audio-language models, or multimodal LLMs
Experience building production ML pipelines and MLOps infrastructure
Proven technical leadership and mentoring experience

Nice to have

Hands-on experience with NVIDIA AI technologies such as NeMo, NeMo Agent Toolkit, Nemotron, Riva, NIM, and Voice Chat
Experience building voice-first agentic AI systems with reasoning, tool use, and multimodal capabilities
Strong expertise in speech AI, including ASR, TTS, speech-to-speech, and conversational AI
Experience benchmarking AI agents for quality, latency, reliability, safety, and user experience
Familiarity with Sovereign AI, enterprise AI deployment, and data governance requirements

Other signals

Developing spoken language and multimodal AI systems

Powering secure, scalable, and sovereign voice-first AI solutions

Leveraging AI-native and agentic workflows

Designing and deploying AI models and platforms with strong consideration for AI safety, security, privacy, and Sovereign AI requirements

Establishing and driving benchmarking frameworks for voice agents

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.

Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.

At NVIDIA, we advance innovation and help customers build the next generation of Sovereign AI platforms. Our Speech team seeks a Senior Speech LLM Engineer. This role involves developing spoken language and multimodal AI systems that power secure, scalable, and sovereign voice-first AI solutions. It supports enterprises, governments, and regional ecosystems in deploying and operating AI systems. These systems maintain control over data, models, infrastructure, and regulatory requirements.

What you'll be doing:

Develop and advance speech, audio, and multimodal LLMs that power next-generation voice-first agentic AI experiences.
Leverage AI-native and agentic workflows to accelerate research, development, evaluation, and deployment of AI systems.
Design and deploy AI models and platforms with strong consideration for AI safety, security, privacy, and Sovereign AI requirements.
Establish and drive benchmarking frameworks for voice agents, including speech quality, reasoning, tool use, latency, reliability, and user experience.
Lead technical initiatives, mentor engineers, and foster a One Team culture through close collaboration across research, engineering, product, and customer teams.

What we need to see:

Master's degree in Computer Science, AI, Electrical Engineering, or related field (or equivalent experience).
2+ years of experience building and deploying speech, multimodal AI, or LLM systems.
Strong Python skills and experience with PyTorch or TensorFlow.
Hands-on experience with ASR, TTS, speech understanding, audio-language models, or multimodal LLMs.
Experience building production ML pipelines and MLOps infrastructure.
Proven technical leadership and mentoring experience.
Strong problem-solving, communication, and teamwork skills.

Ways to stand out from the crowd:

Hands-on experience with NVIDIA AI technologies such as NeMo, NeMo Agent Toolkit, Nemotron, Riva, NIM, and Voice Chat.
Experience building voice-first agentic AI systems with reasoning, tool use, and multimodal capabilities.
Strong expertise in speech AI, including ASR, TTS, speech-to-speech, and conversational AI.
Experience benchmarking AI agents for quality, latency, reliability, safety, and user experience.
Familiarity with Sovereign AI, enterprise AI deployment, and data governance requirements.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/

What you'll be doing:

Develop and advance speech, audio, and multimodal LLMs that power next-generation voice-first agentic AI experiences.

Leverage AI-native and agentic workflows to accelerate research, development, evaluation, and deployment of AI systems.

Design and deploy AI models and platforms with strong consideration for AI safety, security, privacy, and Sovereign AI requirements.

Establish and drive benchmarking frameworks for voice agents, including speech quality, reasoning, tool use, latency, reliability, and user experience.

Lead technical initiatives, mentor engineers, and foster a One Team culture through close collaboration across research, engineering, product, and customer teams.

What we need to see:

Master's degree in Computer Science, AI, Electrical Engineering, or related field (or equivalent experience).

2+ years of experience building and deploying speech, multimodal AI, or LLM systems.

Strong Python skills and experience with PyTorch or TensorFlow.

Hands-on experience with ASR, TTS, speech understanding, audio-language models, or multimodal LLMs.

Experience building production ML pipelines and MLOps infrastructure.

Proven technical leadership and mentoring experience.

Strong problem-solving, communication, and teamwork skills.

Ways to stand out from the crowd:

Hands-on experience with NVIDIA AI technologies such as NeMo, NeMo Agent Toolkit, Nemotron, Riva, NIM, and Voice Chat.

Experience building voice-first agentic AI systems with reasoning, tool use, and multimodal capabilities.

Strong expertise in speech AI, including ASR, TTS, speech-to-speech, and conversational AI.

Experience benchmarking AI agents for quality, latency, reliability, safety, and user experience.

Familiarity with Sovereign AI, enterprise AI deployment, and data governance requirements.

Speech LLM Engineer, Voice-first Agentic AI

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

What you'll be doing:

What we need to see:

Ways to stand out from the crowd:

What you'll be doing:

What we need to see:

Ways to stand out from the crowd: