UK Internship Program

Perplexity Perplexity · AI Frontier · London, United Kingdom · AI

Internship program focused on improving the AI inference engine and serving stack for Perplexity products, optimizing latency and throughput on large GPU clusters, and supporting new models and inference optimizations.

What you'd actually do

  1. Work with the inference team to improve serving latency and throughput
  2. Bring up support for new models and state-of-the art inference optimizations or quantization schemes
  3. Optimize inference across the entire stack, from GPU kernels to serving endpoints

Skills

Required

  • Master's or PhD in Computer Science
  • performance-related subjects (HPC, Compilers, Distributed Systems)
  • ML frameworks (Torch, JAX)
  • GPU programming (CUDA, Triton)
  • High-Performance Computing (OpenMPI)
  • multi-threaded programming
  • networking
  • compilation
  • systems programming

What the JD emphasized

  • AI Inference team
  • inference engine
  • serving stack
  • GPU clusters
  • latency and throughput
  • GPU kernels
  • serving endpoints
  • ML frameworks
  • GPU programming
  • High-Performance Computing

Other signals

  • inference engine
  • serving stack
  • GPU clusters
  • latency and throughput