What you'd actually do

Lead the research, design, and implementation of state-of-the-art machine learning algorithms for speech processing, voice transfer, source separation, and upmixing in media post-production environments.

Drive the architecture and deployment of scalable model training pipelines using PyTorch and distributed computing frameworks.

Develop novel generative audio models, including latent diffusion, flow-based models, variational autoencoders, and neural vocoders, optimized for professional soundtrack production.

Own end-to-end model lifecycle management: pretraining, fine-tuning, validation, inference optimization, and CI/CD integration.

Guide the development of personalized model adaptation workflows to support per-user tuning, cross-project continuity, and flexible deployment.

Skills

Required

MSc or PhD in Computer Science, Electrical Engineering, Applied Math, or a related field with a focus on AI/ML and mult-imodal signal processing
5 years of professional experience in applied ML
Expertise in building and scaling models using PyTorch
Fluency in training, fine-tuning, and inference for deep neural networks
Demonstrated experience developing generative models such as VAE, GAN, diffusion models, or neural vocoders
Deep understanding of audio-specific ML domains, including source separation, speech enhancement, music processing, and cross-modal tasks
Experience with MLOps tooling (e.g., Weights & Biases, MLflow, Datachain)
Docker-based containerization
Scalable infrastructure for distributed training
Fluency in audio signal processing fundamentals
Integration of DSP into ML pipelines
Proven ability to contribute to architectural planning, research strategy, and production deployment

Nice to have

Familiarity with audio/text/video multi-modal frameworks and cross-domain representations
Experience implementing real-time or near-real-time inference pipelines in cloud or edge environments
Working knowledge of latent diffusion audio models
Strong knowledge of industry-standard audio datasets and benchmarks
Experience optimizing inference pipelines for creative applications or interactive use
Proficiency in lower-level audio frameworks (C / C++)
Contributions to published research at top-tier conferences and/or open-source ML frameworks

What the JD emphasized

deep focus on audio-centric AI/ML research and deployment

Expertise in building and scaling models using PyTorch

Demonstrated experience developing generative models

Deep understanding of audio-specific ML domains

Proven ability to contribute to architectural planning, research strategy, and production deployment

Job Posting Title:

Sr Staff R&D Engineer

Req ID:

10127968

Job Description:

The Skywalker Sound Development Group is seeking a highly accomplished Sr Staff R&D Engineer (AI/ML) to lead the development of transformative audio intelligence technologies for global media production. This senior-level role is central to advancing our next-generation soundtrack platform, with a focus on speech processing, style transfer, upmixing, source separation, and generative audio synthesis.

You will architect, build, and optimize cutting-edge machine learning systems at scale—leveraging foundational models, neural vocoders, latent diffusion models, and advanced retraining workflows. As a core member of our applied R&D team, you will contribute to technical direction, collaborate across product and engineering, and deliver production-ready solutions that integrate seamlessly into creative and operational workflows for elite content creators worldwide.

This role is considered Hybrid, which means the employee will work onsite in our Nicasio, CA office and occasionally from home.

What You’ll Do

Lead the research, design, and implementation of state-of-the-art machine learning algorithms for speech processing, voice transfer, source separation, and upmixing in media post-production environments.
Drive the architecture and deployment of scalable model training pipelines using PyTorch and distributed computing frameworks.
Develop novel generative audio models, including latent diffusion, flow-based models, variational autoencoders, and neural vocoders, optimized for professional soundtrack production.
Own end-to-end model lifecycle management: pretraining, fine-tuning, validation, inference optimization, and CI/CD integration.
Guide the development of personalized model adaptation workflows to support per-user tuning, cross-project continuity, and flexible deployment.
Collaborate with product, platform, and engineering leads to define integration strategies within a secure, cloud-optimized SaaS environment.
Stay at the forefront of generative audio, multi-modal modeling, and self-supervised learning—translating emerging research into applied innovation.
Contribute to internal tooling and infrastructure that improves iteration speed, reproducibility, and explainability of deployed models.
Mentor junior researchers and engineers, and contribute to a culture of rigorous experimentation, collaboration, and continuous improvement.

What We’re Looking For

MSc or PhD in Computer Science, Electrical Engineering, Applied Math, or a related field with a focus on AI/ML and mult-imodal signal processing.
5 years of professional experience in applied ML, with a deep focus on audio-centric AI/ML research and deployment.
Expertise in building and scaling models using PyTorch, with fluency in training, fine-tuning, and inference for deep neural networks.
Demonstrated experience developing generative models such as VAE, GAN, diffusion models, or neural vocoders (e.g., HiFi-GAN, WaveNet).
Deep understanding of audio-specific ML domains, including source separation, speech enhancement, music processing, and cross-modal tasks.
Experience with MLOps tooling (e.g., Weights & Biases, MLflow, Datachain), Docker-based containerization, and scalable infrastructure for distributed training.
Fluency in audio signal processing fundamentals and the integration of DSP into ML pipelines.
Proven ability to contribute to architectural planning, research strategy, and production deployment in complex, multi-stakeholder environments.

Preferred Qualifications

Familiarity with audio/text/video multi-modal frameworks and cross-domain representations.
Experience implementing real-time or near-real-time inference pipelines in cloud or edge environments (e.g., AWS, GCP, on-prem GPUs).
Working knowledge of latent diffusion audio models (e.g., stable-audio, AudioLDM, AudioGen).
Strong knowledge of industry-standard audio datasets and benchmarks (LibriSpeech, VCTK, MUSDB, etc.).
Experience optimizing inference pipelines for creative applications or interactive use.
Proficiency in lower-level audio frameworks (C / C++, etc.)
Contributions to published research at top-tier conferences (NeurIPS, ICASSP, ICLR, Interspeech) and/or open-source ML frameworks.

The hiring range for this position in Nicasio, CA is $206,400 to $276,700 per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate’s geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered.

Job Posting Segment:

Skywalker Sound

Job Posting Primary Business:

Skywalker Sound-Engineering

Primary Job Posting Category:

Software Engineer

Employment Type:

Full time

Primary City, State, Region, Postal Code:

Nicasio, CA, USA

Alternate City, State, Region, Postal Code:

Date Posted:

2025-08-19