Senior Research Scientist - Machine Learning System

ByteDance ByteDance · Big Tech · San Jose, CA · R&D

Develop and optimize large-scale distributed ML training and inference systems, focusing on LLM inference frameworks and GPU/CUDA performance optimization for high-performance LLM inference engines.

What you'd actually do

  1. Responsible for developing and optimizing LLM inference framework.
  2. Responsible for GPU and CUDA Performance optimization to create an industry-leading high-performance LLM inference engine.

Skills

Required

  • C/C++
  • algorithms and data structures
  • Python
  • deep learning algorithms
  • neural networks
  • Pytorch

Nice to have

  • GPU high-performance computing optimization
  • CUDA
  • computer architecture
  • parallel computing optimization
  • memory access optimization
  • low-bit computing
  • TensorRT-LLM
  • ORCA
  • VLLM
  • LLM models
  • accelerating LLM model optimization

What the JD emphasized

  • LLM inference framework
  • GPU and CUDA Performance optimization
  • high-performance LLM inference engine

Other signals

  • LLM inference framework
  • GPU and CUDA Performance optimization
  • high-performance LLM inference engine