Staff Data Engineer (audio/ml)

Disney Disney · Media · Nicasio, CA +1

Staff Data Engineer focused on building and optimizing data pipelines for audio/ML research, specifically for training, retraining, and evaluating machine learning models in immersive and multichannel audio applications.

What you'd actually do

  1. Design, implement, and maintain scalable, automated data pipelines for the ingestion, preprocessing, and transformation of large-scale audio datasets.
  2. Ensure pipelines support efficient model training and retraining workflows, enabling continuous improvement of AI/ML models.
  3. Collaborate with AI/ML researchers to define data requirements and integrate feedback to improve data pipeline functionality.
  4. Develop advanced preprocessing techniques for immersive and multichannel audio formats (e.g., Dolby Atmos, high-order ambisonics).
  5. Automate data cleaning, normalization, and augmentation processes to prepare datasets for various model architectures, including foundational models and transformers.

Skills

Required

  • Python
  • Pandas
  • NumPy
  • PyTorch data utilities
  • Librosa
  • FFmpeg
  • SoX
  • GitLab
  • Apache Spark
  • Airflow
  • Luigi
  • Docker
  • Kubernetes
  • AWS S3
  • Redshift
  • Google BigQuery

Nice to have

  • PhD in Data Engineering/Science, Computer Science, Signal Processing, or a related field
  • active learning
  • model retraining
  • audio-specific datasets and metadata management
  • machine learning principles
  • distributed training pipelines
  • large-scale dataset processing
  • open-source contributions
  • published research in data science or audio processing
  • Tableau
  • Matplotlib
  • AI/ML model monitoring

What the JD emphasized

  • 8+ years of experience in data engineering or data science with a focus on building pipelines for AI/ML applications
  • Hands-on experience with audio processing libraries and tools (e.g., Librosa, FFmpeg, SoX) for handling complex audio formats
  • Experience with immersive and multichannel audio formats

Other signals

  • data pipelines for AI/ML research
  • audio datasets for ML models
  • preprocessing for speech processing, style transfer, source separation