Research Internship Reinforcement Learning (summer)

Cohere Cohere · AI Frontier · Paris, France · Internships

Research internship focused on combining self-distillation and reinforcement learning for LLMs, with applications to code generation and agentic tasks. Investigates mechanisms for handling extremely large rollouts in RLVR to advance state-of-the-art in LLM training.

What you'd actually do

  1. Conduct literature reviews and implement state-of-the-art algorithms in RL and self-distillation.
  2. Design and execute experiments to evaluate the effectiveness of proposed methods on code generation and agentic tasks.
  3. Develop and maintain codebases for both theoretical modeling and practical implementations.
  4. Collaborate with researchers to analyze results, refine methodologies, and prepare findings for publication.
  5. Contribute to the design of mechanisms for handling large rollouts, such as summarization and hierarchical sub-agents.

Skills

Required

  • Python
  • PyTorch
  • TensorFlow
  • reinforcement learning
  • deep learning
  • LLMs

Nice to have

  • coding tasks
  • unit testing
  • compiler tools
  • RLVR
  • self-distillation
  • large-scale ML experiments

What the JD emphasized

  • publication

Other signals

  • reinforcement learning
  • large language models
  • self-distillation
  • code generation
  • agentic tasks
  • large rollouts
  • publication