Sr Engineer, Server Inference

Tenstorrent · Semiconductors · Belgrade, Serbia · Product Software Engineering

The role focuses on developing software for state-of-the-art AI inferencing on Tenstorrent's hardware, including designing APIs, deploying workloads, and benchmarking inference speed. It involves optimizing end-to-end ML inference on custom silicon and building scalable software interfaces.

What you'd actually do

  1. develop software that powers state-of-the-art AI inferencing on Tenstorrent’s cutting-edge hardware
  2. designing APIs
  3. deploying workloads
  4. benchmarking end-to-end inference speed
  5. shape how developers consume and scale model execution on Tenstorrent’s stack

Skills

Required

  • Backend engineering
  • performance bottlenecks
  • scaling infrastructure
  • web technologies
  • protocols
  • system design
  • Python
  • Docker
  • Linux-based environments
  • strong coding practices
  • ability to break down complex problems into high-quality, maintainable code

Nice to have

  • batching
  • caching
  • model parallelism
  • clean software architecture
  • effective abstraction layers

What the JD emphasized

  • optimize end-to-end ML inference on custom silicon
  • shape the experience developers have when using Tenstorrent’s hardware for AI workloads
  • eligible to access U.S. export-controlled technology

Other signals

  • inference
  • serving
  • APIs
  • performance