Systems Research Engineer, GPU Programming

Together AI Together AI · Data AI · San Francisco, CA · Research

This role focuses on optimizing and developing GPU-accelerated kernels and algorithms for ML/AI applications, requiring expertise in GPU programming (CUDA, Triton) and performance profiling. The engineer will collaborate with modeling, hardware, and software teams to enhance AI system efficiency and co-design GPU architectures.

What you'd actually do

  1. Optimize and fine-tune GPU code to achieve better performance and scalability
  2. Collaborate with cross-functional teams to integrate GPU-accelerated solutions into existing software systems
  3. Stay up-to-date with the latest advancements in GPU programming techniques and technologies

Skills

Required

  • GPU programming (CUDA, Triton)
  • Parallel computing
  • ML/AI applications knowledge
  • Performance profiling
  • Optimization tools for GPU

Nice to have

  • Co-design GPU kernels and model architecture
  • Experience with FlashAttention, Hyena, FlexGen, RedPajama

What the JD emphasized

  • GPU programming
  • ML/AI applications
  • performance profiling
  • optimization tools
  • GPU programming techniques

Other signals

  • GPU programming
  • ML/AI applications
  • performance optimization
  • kernels
  • algorithms