Software Engineer, Tt-distributed

Tenstorrent · Semiconductors · Santa Clara, CA · Scale Out

Software Engineer role focused on developing and optimizing distributed software systems for AI and HPC clusters, specifically for distributed inference and training infrastructure. Requires strong C/C++ systems programming, distributed computing principles, and experience with MPI-based technologies.

What you'd actually do

  1. Architect, implement, and optimize distributed software systems that coordinate computation and communication across clusters of AI accelerators and CPUs.
  2. Design and build distributed APIs enabling data-parallel and tensor-parallel AI workloads.
  3. Leverage MPI-based technologies and related frameworks to scale programming models across multiple hosts and compute nodes.
  4. Develop robust systems using IPC, inter-node sockets, and distributed communication primitives to ensure reliability and high performance.
  5. Build and maintain testing, debugging, profiling, and monitoring tools for large-scale distributed workloads and collaborate with model and systems teams on cluster bring-up.

Skills

Required

  • C++
  • systems programming
  • operating systems
  • distributed systems principles
  • IPC
  • socket programming
  • cluster resource coordination
  • MPI

Nice to have

  • AI accelerators
  • HPC clusters
  • data-parallel AI workloads
  • tensor-parallel AI workloads
  • profiling
  • monitoring

What the JD emphasized

  • access to U.S. export-controlled technology
  • distributed inference and training infrastructure

Other signals

  • distributed systems
  • AI accelerators
  • inference and training infrastructure