AI Applied Scientist - Phd Intern, Evaluation Systems and Metrics

Zillow Zillow · Consumer · United States · Remote

Zillow is seeking a remote PhD intern to develop cutting-edge evaluation methodologies for AI systems, focusing on creating robust, scalable metrics and frameworks for generative models across multiple modalities. The role involves research into novel metrics, self-improving assessment systems, privacy-preserving evaluation, and ethical fair housing evaluation for agentic systems.

What you'd actually do

  1. develop cutting-edge evaluation methodologies for AI systems
  2. creating robust, scalable metrics and frameworks to assess the quality, consistency, and performance of generative models across multiple modalities
  3. Novel Evaluation Metrics: Develop innovative assessment methodologies for emerging AI capabilities, focusing on consistency and quality across complex multi-modal outputs
  4. Self-Improving Assessment: Design evaluation systems that learn and adapt from feedback, automatically discovering new evaluation criteria and improving assessment quality over time
  5. Privacy-Preserving Evaluation: Design frameworks that incorporate domain-specific implementations of differential privacy to protect sensitive user information while maintaining utility for model training and assessment.
  6. Ethical Fair Housing Evaluation: Develop scalable methodologies for assessing agentic systems, ensuring compliance with fair housing standards and promoting ethical, responsible AI deployment

Skills

Required

  • PhD student in computer science, machine learning, computer vision, or a related field
  • Evaluation methodologies for AI/ML systems
  • Computer vision metrics and 3D consistency assessment
  • Generative model evaluation (text, image, video, 3D)
  • Multi-modal assessment and automated feedback systems
  • Knowledge of data privacy methods (e.g., differential privacy, federated learning, secure ML) and their application.
  • Single agent or multi-agent system evaluations
  • modern deep learning frameworks (e.g., PyTorch, Hugging Face Transformers)
  • Strong research mindset
  • motivation to publish

Nice to have

  • A record of publication in conferences, workshops, or journals is a plus

What the JD emphasized

  • strong publication record
  • publication track record

Other signals

  • developing evaluation methodologies
  • designing evaluation frameworks
  • assessing AI capabilities
  • scalable metrics