Model Research, Optimization, and Training

Tenstorrent Tenstorrent · Semiconductors · Boston, MA +1 · ML Models

Research role focused on optimizing and training large language models on custom AI accelerators, involving techniques like speculative decoding and quantization, and translating research into production-ready systems.

What you'd actually do

  1. Lead research and development efforts focused on LLM training and inference optimization.
  2. Train, evaluate, and optimize state-of-the-art AI models on Tenstorrent hardware.
  3. Improve performance through techniques such as speculative decoding, quantization, kernel fusion, flash attention, and distributed training.
  4. Investigate system bottlenecks and collaborate cross-functionally to drive performance improvements.
  5. Translate cutting-edge ML research into scalable, production-ready solutions.

Skills

Required

  • Python
  • PyTorch
  • Deep understanding of ML architectures
  • LLM training
  • Inference optimization
  • Hands-on experience training large-scale machine learning models

Nice to have

  • speculative decoding
  • quantization
  • kernel fusion
  • flash attention
  • distributed training

What the JD emphasized

  • 4+ years of industry and/or academic experience in ML research and LLM development
  • PhD, published research, or experience with speculative decoding is highly valued

Other signals

  • LLM training
  • inference optimization
  • custom AI accelerators
  • ML research