Internship - Machine Learning Research Engineer

at Perplexity · AI Frontier · Berlin, Germany · Search

Internship role focused on advancing search quality through research and engineering of large-scale deep learning models, representation learning, and RAG pipelines. Involves distributed training, optimization, and evaluation of retrieval and ranking models.

What you'd actually do

Relentlessly push search quality forward — through models, data, tools, or any other leverage available.
Train, and optimize large-scale deep learning models using frameworks like PyTorch, leveraging distributed training (e.g., PyTorch Distributed, DeepSpeed, FSDP) and hardware acceleration, with a focus on retrieval and ranking models.
Conduct research in representation learning, including contrastive learning, multilingual, evaluation, and multimodal modeling for search and retrieval.
Build and optimize RAG pipelines for grounding and answer generation.

Skills

Required

PyTorch
distributed training
performance optimization
representation learning
contrastive learning
multilingual modeling
multimodal modeling
search and retrieval systems
quality evaluation principles and metrics

Nice to have

DeepSpeed
FSDP
dense & sparse vector representations
representation fusion
cross-lingual representation alignment
training data optimization
robust evaluation

What the JD emphasized

publication record in AI/ML conferences or workshops

Other signals

push search quality
train and optimize large-scale deep learning models
conduct research in representation learning
build and optimize RAG pipelines

Read full job description

Internship Program Berlin

Internship program: 12 - 24 weeks, full-time, in-person in the Berlin office.

Responsibilities

Relentlessly push search quality forward — through models, data, tools, or any other leverage available.
Train, and optimize large-scale deep learning models using frameworks like PyTorch, leveraging distributed training (e.g., PyTorch Distributed, DeepSpeed, FSDP) and hardware acceleration, with a focus on retrieval and ranking models.
Conduct research in representation learning, including contrastive learning, multilingual, evaluation, and multimodal modeling for search and retrieval.
Build and optimize RAG pipelines for grounding and answer generation.

Qualifications

Understanding of search and retrieval systems, including quality evaluation principles and metrics.
Strong proficiency with PyTorch, including experience in distributed training techniques and performance optimization for large models.
Interested in representation learning, including contrastive learning, dense & sparse vector representations, representation fusion, cross-lingual representation alignment, training data optimization and robust evaluation.
Publication record in AI/ML conferences or workshops (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, SIGIR).