Senior Deep Learning Scientist, Speech Synthesis

NVIDIA NVIDIA · Semiconductors · Ho Chi Minh City, Vietnam +1

NVIDIA is seeking a Senior Deep Learning Scientist to work on their Speech AI product, Riva. The role involves training speech synthesis models (mel-spectrogram and vocoder), measuring and analyzing model performance, maintaining the TTS evaluation system, and improving speech data processing and training set preparation. The ideal candidate has a Master's or PhD, 5+ years of ML/AI experience, strong Python and PyTorch skills, and hands-on experience training speech synthesis models.

What you'd actually do

  1. Train Speech Synthesis mel-spectrogram and vocoder models.
  2. Measure, benchmark, and analyze model performance, accuracy, and bias; recommend improvements.
  3. Maintain the TTS model evaluation system and characterize quality metrics across platforms.
  4. Improve processes for speech data processing, augmentation, filtering, and TTS training set preparation.
  5. Build knowledge of TTS datasets for training and evaluation.

Skills

Required

  • Master’s degree (or equivalent experience) or PhD in Computer Science, Electrical Engineering, AI, Applied Math, Linguistics, or Computational Linguistics
  • 5+ years of experience in machine learning and AI model development
  • Strong Python programming skills
  • Solid fundamentals in software design and optimization
  • Strong knowledge of ML/DL techniques and tools, including CNNs, RNNs/LSTMs, and Transformers
  • Hands-on experience training speech synthesis models, including TTS, voice cloning, or speech-to-speech systems
  • Proficiency with PyTorch
  • Familiarity with DSP and feature extraction techniques (FFT, MFCC, Mel spectrograms)
  • Experience with Git, Gerrit, or GitLab
  • Strong collaboration skills

Nice to have

  • Experience with multilingual or code-switched TTS, voice cloning, or cross-lingual voice cloning
  • Familiarity with text normalization, inverse text normalization, and multilingual G2P systems
  • Interest in linguistics, phonetics, phonology, and language technologies
  • Strong C++ programming skills
  • Familiarity with CUDA, cuDNN, or TensorRT
  • Experience deploying ML models on data center, cloud, or embedded systems

What the JD emphasized

  • 5+ years of experience in machine learning and AI model development
  • Hands-on experience training speech synthesis models, including TTS, voice cloning, or speech-to-speech systems

Other signals

  • training speech synthesis models
  • measure, benchmark, and analyze model performance
  • maintain the TTS model evaluation system
  • improve processes for speech data processing, augmentation, filtering, and TTS training set preparation
  • build knowledge of TTS datasets for training and evaluation