Research Scientist Intern, Audio Quality With AI (phd)

Meta Meta · Big Tech · Redmond, WA

Research Scientist Intern focused on speech perception and audio quality, investigating phonemic errors and their link to human judgments using multimodal LLMs as a tool. The role involves dataset curation, model comparison, and relating findings to perceptual data.

What you'd actually do

  1. Investigate systematic phonemic errors as causal factors in perceived speech quality degradation, and link them to human perceptual judgments
  2. Build and curate datasets and benchmarks of speech for phoneme-level analysis
  3. Explore and compare the capabilities of audio and video (multimodal) LLMs as tools to support this analysis
  4. Relate findings to human perceptual data (quality preference and intelligibility) and translate them into actionable insights for research and engineering teams
  5. Where appropriate, adapt multimodal models to the task in a supporting capacity

Skills

Required

  • Python
  • Matlab
  • PyTorch
  • TensorFlow
  • speech perception
  • psychoacoustics
  • acoustic phonetics
  • audio computational models
  • LLMs
  • software engineering
  • open source contributions
  • audio processing
  • speech quality assessment
  • multichannel audio processing
  • visual and acoustic scene analysis
  • large scale data analysis
  • theoretical and empirical research
  • cross-functional communication

Nice to have

  • Generative AI
  • Transformer Models
  • Computer vision
  • multimodal models
  • video LLMs

What the JD emphasized

  • PhD degree
  • 3+ years experience with Python, Matlab, or similar
  • 3+ years experience with machine learning software platforms such as PyTorch, TensorFlow, etc
  • Background in speech perception, psychoacoustics, or acoustic phonetics
  • Experience deploying novel audio computational models and LLMs
  • Demonstrated software engineer experience via an internship, work experience, coding competitions, or widely used contributions in open source repositories (e.g. Github)
  • Experience in advancing AI techniques, including core contributions to open source libraries and frameworks in computer vision or audio processing
  • Experience with audio and speech quality assessment
  • Experience with multichannel audio processing
  • Experience in visual and acoustic scene analysis
  • Experience manipulating and analyzing complex, large scale, high-dimensionality data from varying sources
  • Proven track record of achieving significant results as demonstrated by grants, fellowships, patents, as well as first-authored publications at leading workshops or top computer vision and machine learning conferences such as ARO, ASA, NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, ICCV, ECCV, ICASSP, InterSpeech or simila
  • Experience in utilizing theoretical and empirical research to solve problems
  • Experience working and communicating cross functionally in a team environment

Other signals

  • speech perception
  • audio quality
  • phonemic errors
  • multimodal LLMs
  • human quality and intelligibility judgments