Internship - Machine Learning Research Engineer

at Perplexity · AI Frontier · Berlin, Germany · Search

Internship role focused on advancing search quality through research and engineering of large-scale deep learning models, representation learning, and RAG pipelines. Involves distributed training, optimization, and evaluation of retrieval and ranking models.

What you'd actually do

  1. Relentlessly push search quality forward — through models, data, tools, or any other leverage available.
  2. Train, and optimize large-scale deep learning models using frameworks like PyTorch, leveraging distributed training (e.g., PyTorch Distributed, DeepSpeed, FSDP) and hardware acceleration, with a focus on retrieval and ranking models.
  3. Conduct research in representation learning, including contrastive learning, multilingual, evaluation, and multimodal modeling for search and retrieval.
  4. Build and optimize RAG pipelines for grounding and answer generation.

Skills

Required

  • PyTorch
  • distributed training
  • performance optimization
  • representation learning
  • contrastive learning
  • multilingual modeling
  • multimodal modeling
  • search and retrieval systems
  • quality evaluation principles and metrics

Nice to have

  • DeepSpeed
  • FSDP
  • dense & sparse vector representations
  • representation fusion
  • cross-lingual representation alignment
  • training data optimization
  • robust evaluation

What the JD emphasized

  • publication record in AI/ML conferences or workshops

Other signals

  • push search quality
  • train and optimize large-scale deep learning models
  • conduct research in representation learning
  • build and optimize RAG pipelines
Read full job description

Internship Program Berlin

Internship program: 12 - 24 weeks, full-time, in-person in the Berlin office.

Responsibilities

  • Relentlessly push search quality forward — through models, data, tools, or any other leverage available.
  • Train, and optimize large-scale deep learning models using frameworks like PyTorch, leveraging distributed training (e.g., PyTorch Distributed, DeepSpeed, FSDP) and hardware acceleration, with a focus on retrieval and ranking models.
  • Conduct research in representation learning, including contrastive learning, multilingual, evaluation, and multimodal modeling for search and retrieval.
  • Build and optimize RAG pipelines for grounding and answer generation.

Qualifications

  • Understanding of search and retrieval systems, including quality evaluation principles and metrics.
  • Strong proficiency with PyTorch, including experience in distributed training techniques and performance optimization for large models.
  • Interested in representation learning, including contrastive learning, dense & sparse vector representations, representation fusion, cross-lingual representation alignment, training data optimization and robust evaluation.
  • Publication record in AI/ML conferences or workshops (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, SIGIR).