Speech LLM Engineer, Voice-first Agentic AI

NVIDIA NVIDIA · Semiconductors · Ho Chi Minh City, Vietnam +1

NVIDIA is seeking a Senior Speech LLM Engineer to develop spoken language and multimodal AI systems for voice-first agentic AI solutions, focusing on security, scalability, and Sovereign AI. The role involves leveraging AI-native and agentic workflows, designing AI models with safety and privacy considerations, and establishing benchmarking frameworks for voice agents. The engineer will also lead technical initiatives and mentor others.

What you'd actually do

  1. Develop and advance speech, audio, and multimodal LLMs that power next-generation voice-first agentic AI experiences.
  2. Leverage AI-native and agentic workflows to accelerate research, development, evaluation, and deployment of AI systems.
  3. Design and deploy AI models and platforms with strong consideration for AI safety, security, privacy, and Sovereign AI requirements.
  4. Establish and drive benchmarking frameworks for voice agents, including speech quality, reasoning, tool use, latency, reliability, and user experience.
  5. Lead technical initiatives, mentor engineers, and foster a One Team culture through close collaboration across research, engineering, product, and customer teams.

Skills

Required

  • Master's degree in Computer Science, AI, Electrical Engineering, or related field (or equivalent experience)
  • 2+ years of experience building and deploying speech, multimodal AI, or LLM systems
  • Strong Python skills
  • Experience with PyTorch or TensorFlow
  • Hands-on experience with ASR, TTS, speech understanding, audio-language models, or multimodal LLMs
  • Experience building production ML pipelines and MLOps infrastructure
  • Proven technical leadership and mentoring experience

Nice to have

  • Hands-on experience with NVIDIA AI technologies such as NeMo, NeMo Agent Toolkit, Nemotron, Riva, NIM, and Voice Chat
  • Experience building voice-first agentic AI systems with reasoning, tool use, and multimodal capabilities
  • Strong expertise in speech AI, including ASR, TTS, speech-to-speech, and conversational AI
  • Experience benchmarking AI agents for quality, latency, reliability, safety, and user experience
  • Familiarity with Sovereign AI, enterprise AI deployment, and data governance requirements

What the JD emphasized

  • Sovereign AI
  • voice-first agentic AI
  • speech, audio, and multimodal LLMs
  • AI safety, security, privacy
  • benchmarking frameworks for voice agents
  • reasoning, tool use, latency, reliability, and user experience

Other signals

  • Developing spoken language and multimodal AI systems
  • Powering secure, scalable, and sovereign voice-first AI solutions
  • Leveraging AI-native and agentic workflows
  • Designing and deploying AI models and platforms with strong consideration for AI safety, security, privacy, and Sovereign AI requirements
  • Establishing and driving benchmarking frameworks for voice agents