What you'd actually do

Train Speech Synthesis mel-spectrogram and vocoder models.

Measure, benchmark, and analyze model performance, accuracy, and bias; recommend improvements.

Maintain the TTS model evaluation system and characterize quality metrics across platforms.

Improve processes for speech data processing, augmentation, filtering, and TTS training set preparation.

Build knowledge of TTS datasets for training and evaluation.

Skills

Required

Master’s degree (or equivalent experience) or PhD in Computer Science, Electrical Engineering, AI, Applied Math, Linguistics, or Computational Linguistics
5+ years of experience in machine learning and AI model development
Strong Python programming skills
Solid fundamentals in software design and optimization
Strong knowledge of ML/DL techniques and tools, including CNNs, RNNs/LSTMs, and Transformers
Hands-on experience training speech synthesis models, including TTS, voice cloning, or speech-to-speech systems
Proficiency with PyTorch
Familiarity with DSP and feature extraction techniques (FFT, MFCC, Mel spectrograms)
Experience with Git, Gerrit, or GitLab
Strong collaboration skills

Nice to have

Experience with multilingual or code-switched TTS, voice cloning, or cross-lingual voice cloning
Familiarity with text normalization, inverse text normalization, and multilingual G2P systems
Interest in linguistics, phonetics, phonology, and language technologies
Strong C++ programming skills
Familiarity with CUDA, cuDNN, or TensorRT
Experience deploying ML models on data center, cloud, or embedded systems

NVIDIA is a global leader in AI, high-performance computing, and visualization, with GPU technology powering everything from modern computers to robots and autonomous systems. As a pioneer in AI computing, NVIDIA is shaping the future of conversational AI.

NVIDIA is looking for Speech Data Scientists to develop the high‑impact, high‑visibility Speech AI product Riva and improve the experience of millions of customers. If you're creative and passionate about solving real‑world conversational AI problems, come join our Riva Product Engineering team.

More details: https://developer.nvidia.com/riva

What you'll be doing:

Train Speech Synthesis mel-spectrogram and vocoder models.
Measure, benchmark, and analyze model performance, accuracy, and bias; recommend improvements.
Maintain the TTS model evaluation system and characterize quality metrics across platforms.
Improve processes for speech data processing, augmentation, filtering, and TTS training set preparation.
Build knowledge of TTS datasets for training and evaluation.
Collaborate with cross-functional teams on new features, improvements, and issue triage.
Participate in code reviews, design reviews, use case reviews, and test plan reviews.

What we need to see:

Master’s degree (or equivalent experience) or PhD in Computer Science, Electrical Engineering, AI, Applied Math, Linguistics, or Computational Linguistics.
5+ years of experience in machine learning and AI model development.
Strong Python programming skills, with solid fundamentals in software design and optimization.
Strong knowledge of ML/DL techniques and tools, including CNNs, RNNs/LSTMs, and Transformers.
Hands-on experience training speech synthesis models, including TTS, voice cloning, or speech-to-speech systems.
Proficiency with PyTorch and familiarity with DSP and feature extraction techniques (FFT, MFCC, Mel spectrograms).
Experience with Git, Gerrit, or GitLab, and strong collaboration skills.

Ways to stand out from the crowd:

Experience with multilingual or code-switched TTS, voice cloning, or cross-lingual voice cloning.
Familiarity with text normalization, inverse text normalization, and multilingual G2P systems.
Interest in linguistics, phonetics, phonology, and language technologies.
Strong C++ programming skills and familiarity with CUDA, cuDNN, or TensorRT.
Experience deploying ML models on data center, cloud, or embedded systems.

NVIDIA is committed to fostering a diverse work environment and is proud to be an equal opportunity employer. We do not discriminate based on race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/

What you'll be doing:

Train Speech Synthesis mel-spectrogram and vocoder models.

Measure, benchmark, and analyze model performance, accuracy, and bias; recommend improvements.

Maintain the TTS model evaluation system and characterize quality metrics across platforms.

Improve processes for speech data processing, augmentation, filtering, and TTS training set preparation.

Build knowledge of TTS datasets for training and evaluation.

Collaborate with cross-functional teams on new features, improvements, and issue triage.

Participate in code reviews, design reviews, use case reviews, and test plan reviews.

What we need to see:

Master’s degree (or equivalent experience) or PhD in Computer Science, Electrical Engineering, AI, Applied Math, Linguistics, or Computational Linguistics.

5+ years of experience in machine learning and AI model development.

Strong Python programming skills, with solid fundamentals in software design and optimization.

Strong knowledge of ML/DL techniques and tools, including CNNs, RNNs/LSTMs, and Transformers.

Hands-on experience training speech synthesis models, including TTS, voice cloning, or speech-to-speech systems.

Proficiency with PyTorch and familiarity with DSP and feature extraction techniques (FFT, MFCC, Mel spectrograms).

Experience with Git, Gerrit, or GitLab, and strong collaboration skills.

Ways to stand out from the crowd:

Experience with multilingual or code-switched TTS, voice cloning, or cross-lingual voice cloning.

Familiarity with text normalization, inverse text normalization, and multilingual G2P systems.

Interest in linguistics, phonetics, phonology, and language technologies.

Strong C++ programming skills and familiarity with CUDA, cuDNN, or TensorRT.

Experience deploying ML models on data center, cloud, or embedded systems.

Senior Deep Learning Scientist, Speech Synthesis

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

What you'll be doing:

What we need to see:

Ways to stand out from the crowd:

What you'll be doing:

What we need to see:

Ways to stand out from the crowd: