Software Engineer, Kernel Development and Optimization

Tenstorrent · Semiconductors · Warsaw, Poland · OPs

Software Engineer focused on developing and optimizing performance-critical kernels for AI hardware, targeting ML and HPC workloads. This role involves C++ systems engineering, low-level optimization, and close collaboration with hardware and software teams.

What you'd actually do

  1. Engineers who can design, implement, and optimize GPU-style kernels such as matrix multiplication, attention primitives, and data-movement operations.
  2. Clear ownership of performance, from identifying bottlenecks to delivering measurable throughput improvements.
  3. Contribution to host-side orchestration code and parallelization strategies.
  4. Development of micro-benchmarks, regression tests, and tooling to ensure correctness and sustained performance gains.
  5. Close collaboration with compiler, runtime, ML, and hardware teams to integrate kernels into production systems.

Skills

Required

  • C++ systems engineering
  • performance-critical software development
  • low-level software development
  • concurrency
  • synchronization
  • latency hiding
  • compute vs memory trade-offs
  • profiling
  • benchmarking
  • debugging complex runtime or kernel-level issues
  • design, implement, and optimize GPU-style kernels
  • matrix multiplication
  • attention primitives
  • data-movement operations
  • host-side orchestration code
  • parallelization strategies
  • micro-benchmarks
  • regression tests
  • tooling for performance
  • collaboration with compiler, runtime, ML, and hardware teams

Nice to have

  • CUDA
  • AI-assisted and agentic workflows for kernel generation, debugging, and optimization

What the JD emphasized

  • performance-critical
  • low-level software
  • performance
  • optimization
  • kernel-level issues
  • performance problems
  • optimize GPU-style kernels
  • performance, from identifying bottlenecks
  • sustained performance gains
  • performance intuition

Other signals

  • performance-critical kernels
  • ML workloads
  • AI hardware