Applied Scientist, Prime Video - Generative AI

Amazon Amazon · Big Tech · Sunnyvale, CA · Machine Learning Science

Applied Scientist role focused on Generative AI for Prime Video, involving research and development of generative models for synthesis across images, video, and multimedia. The role will innovate in diffusion and flow-based methods, advance visual grounding and 3D estimation, and design multimodal GenAI workflows including agentic pipelines.

What you'd actually do

  1. Research and develop generative models for controllable synthesis across images, video, vector graphics, and multimedia
  2. Innovate in advanced diffusion and flow-based methods (e.g., inverse flow matching, parameter efficient training, guided sampling, test-time adaptation) to improve efficiency, controllability, and scalability.
  3. Advance visual grounding, depth and 3D estimation, segmentation, and matting for integration into pre-visualization, compositing, VFX, and post-production pipelines.
  4. Design multimodal GenAI workflows including visual-language model tooling, structured prompt orchestration, agentic pipelines.

Skills

Required

  • PhD, or Master's degree and 4+ years of CS, CE, ML or related field experience
  • Experience programming in Java, C++, Python or related language
  • 3+ years of building models for business application experience
  • Ph.D. in computer science, engineering, mathematics or equivalent, or experience with Machine and Deep Learning toolkits such as MXNet, TensorFlow, Caffe and PyTorch
  • Experience in generative models (diffusion, flow, transformers)
  • Hands-on experience with image/video synthesis and editing techniques

Nice to have

  • Publications in top-tier AI/ML/Graphics Conferences (CVPR, ICCV/ECCV, SIGGRAPH, NeurIPS, ICLR)
  • Experience with controllable generation methods, including emerging approaches (familiarity with LoRA/ControlNet, parameter-efficient tuning, or test-time training a plus)
  • Expertise in one or more of: harmonization, relighting, style transfer, lip-sync, segmentation, matting, depth estimation, 3D camera/scene modeling.

What the JD emphasized

  • end-to-end ownership
  • production-ready systems
  • Amazon scale

Other signals

  • Generative AI
  • multimedia understanding
  • content personalization
  • controllable synthesis
  • diffusion and flow-based methods
  • visual grounding
  • depth and 3D estimation
  • segmentation
  • matting
  • multimodal GenAI workflows
  • visual-language model tooling
  • structured prompt orchestration
  • agentic pipelines