Senior Scientist, Synthetic Data and Pr… at NVIDIA

What you'd actually do

Build and implement advanced pipelines for generating synthetic datasets using innovative LLM-based methodologies and automated quality evaluation frameworks.

Research and implement privacy-preserving techniques such as differentially private training (DP-SGD), identifying and replacing sensitive information via NER models, and membership inference protection.

Design and maintain open-source software libraries and SDKs with clean APIs and developer-facing documentation, applying robust software design patterns.

Drive software excellence through modern development tooling, architecture managed by configurations, and professional Git/CI-CD workflows.

Publish original research at top machine learning and AI conferences to maintain NVIDIA's technical leadership.

Skills

Required

PhD in Computer Science, Machine Learning, Statistics, or a related field, or equivalent experience
5+ years of research experience in synthetic data generation, data privacy, differential privacy, federated learning, or trustworthy machine learning
Proven track record of developing or maintaining software libraries used by a broad developer community
Deep technical understanding of PyTorch
Deep technical understanding of HuggingFace Transformers ecosystem including PEFT and LoRA
Technical familiarity with LLM inference frameworks such as vLLM or TGI
Strong publication record at premier venues

Nice to have

Active contributions to open-source projects
Specialized expertise with differential privacy concepts and tools such as Opacus
Ability to build and optimize scalable data processing pipelines for large-scale models
Proficiency with NER-based PII detection and advanced anonymization techniques
Functional knowledge of global privacy regulations such as GDPR or CCPA

NVIDIA is at the forefront of the AI revolution, and our research is shaping the future of large language models. We are looking for a Senior Scientist to join our team and help advance our capabilities in generating synthetic datasets and privacy-preserving AI. You will contribute to open-source libraries within the NVIDIA NeMo ecosystem that enable high-quality synthetic data generation at scale while ensuring data privacy. This role combines hands-on software engineering with research in privacy-enhancing methods, and you will collaborate with research, engineering, product teams, and external labs.

What you'll be doing:

Build and implement advanced pipelines for generating synthetic datasets using innovative LLM-based methodologies and automated quality evaluation frameworks.
Research and implement privacy-preserving techniques such as differentially private training (DP-SGD), identifying and replacing sensitive information via NER models, and membership inference protection.
Design and maintain open-source software libraries and SDKs with clean APIs and developer-facing documentation, applying robust software design patterns.
Drive software excellence through modern development tooling, architecture managed by configurations, and professional Git/CI-CD workflows.
Publish original research at top machine learning and AI conferences to maintain NVIDIA's technical leadership.
Mentor interns and junior researchers to develop technical growth within the team.

What we need to see:

PhD in Computer Science, Machine Learning, Statistics, or a related field, or equivalent experience.
A research background of 5+ years in synthetic data generation, data privacy, or related areas such as differential privacy, federated learning, or trustworthy machine learning is required. Comparable experience is also considered.
Proven track record of developing or maintaining software libraries used by a broad developer community.
Deep technical understanding of PyTorch and the HuggingFace Transformers ecosystem including PEFT and LoRA.
Technical familiarity with LLM inference frameworks such as vLLM or TGI.
Strong publication record at premier venues such as NeurIPS, ICML, ICLR, ACL or similar.

Ways to stand out from the crowd:

Active contributions to open-source projects, particularly in ML, security, or privacy domains.
Specialized expertise with differential privacy concepts and tools such as Opacus.
Ability to build and optimize scalable data processing pipelines for large-scale models.
Proficiency with NER-based PII detection and advanced anonymization techniques.
Functional knowledge of global privacy regulations such as GDPR or CCPA.

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and talented people in the world working with us. If you are creative, autonomous, and passionate about building open-source tools that make AI safer and more private, we want to hear from you.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 192,000 USD - 304,750 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until March 3, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

What you'll be doing:

Build and implement advanced pipelines for generating synthetic datasets using innovative LLM-based methodologies and automated quality evaluation frameworks.

Design and maintain open-source software libraries and SDKs with clean APIs and developer-facing documentation, applying robust software design patterns.

Drive software excellence through modern development tooling, architecture managed by configurations, and professional Git/CI-CD workflows.

Publish original research at top machine learning and AI conferences to maintain NVIDIA's technical leadership.

Mentor interns and junior researchers to develop technical growth within the team.

What we need to see:

PhD in Computer Science, Machine Learning, Statistics, or a related field, or equivalent experience.

A research background of 5+ years in synthetic data generation, data privacy, or related areas such as differential privacy, federated learning, or trustworthy machine learning is required. Comparable experience is also considered.

Proven track record of developing or maintaining software libraries used by a broad developer community.

Deep technical understanding of PyTorch and the HuggingFace Transformers ecosystem including PEFT and LoRA.

Technical familiarity with LLM inference frameworks such as vLLM or TGI.

Strong publication record at premier venues such as NeurIPS, ICML, ICLR, ACL or similar.

Ways to stand out from the crowd:

Active contributions to open-source projects, particularly in ML, security, or privacy domains.

Specialized expertise with differential privacy concepts and tools such as Opacus.

Ability to build and optimize scalable data processing pipelines for large-scale models.

Proficiency with NER-based PII detection and advanced anonymization techniques.

Functional knowledge of global privacy regulations such as GDPR or CCPA.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until March 3, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

Senior Scientist, Synthetic Data and Privacy

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

What you'll be doing:

What we need to see:

Ways to stand out from the crowd:

What you'll be doing:

What we need to see:

Ways to stand out from the crowd: