Gemini Audio Research Scientist, Deepmind

Google Google · Big Tech · New York, NY +2

Research Scientist focused on advancing audio capabilities, particularly speech translation, by improving model quality for understanding and generation, exploring RL algorithms, and developing better evaluation methods. The role involves working with audio and visual representations and interactions, and contributing to the wider AI/ML community through publications.

What you'd actually do

  1. Unlock new audio capabilities, with a focus on speech translation.
  2. Improve quality of models for understanding and generation, with a focus on streaming audio interactions and speech translation. This includes research to improve our RL algorithms, better techniques for generation quality, and looking at joint audio and visual representations and interactions.
  3. Better evaluation methods (human and automated metrics) to measure quality.
  4. Help in growing research business by sharing research trends and best practices within the community.
  5. Identify new and upcoming research areas by interacting with potential external and internal collaborators. Help in developing long-term research strategy and plans to expand the impact of Google research.

Skills

Required

  • PhD degree in Computer Science, Computer Engineering, a similar technical field, or equivalent practical experience.
  • 2 years of experience in programming and machine learning with speech or natural language processing.
  • Experience training or evaluating large language models (LLMs).
  • Experience with text, image, video, or audio generation.

Nice to have

  • A credible presence in the AI/ML community, demonstrated through publications or open-source contributions.
  • A deep passion for AI technology and all of its possibilities.

What the JD emphasized

  • speech translation
  • streaming audio interactions
  • RL algorithms
  • audio and visual representations
  • evaluation methods

Other signals

  • speech translation
  • RL algorithms
  • audio and visual representations
  • evaluation methods