Staff Infrastructure Engineer - Models

Tenstorrent Tenstorrent · Semiconductors · Belgrade, Serbia · Product SWE - Models Infrastructure

Infrastructure Engineer focused on building and operating Kubernetes-native applications and services for large-scale AI workloads, including inference and training. The role involves developing operators, APIs, and automation to improve deployment, scaling, monitoring, and reliability of AI infrastructure.

What you'd actually do

  1. Design, build and operate Kubernetes-native applications, services and workloads for large-scale AI infrastructure.
  2. Develop operators, controllers, APIs and automation that make complex workloads easier to deploy, scale, monitor and operate.
  3. Define workload patterns for inference, training, CI/CD, internal development workflows and platform services.
  4. Improve reliability, observability and operational maturity of applications running on Kubernetes.
  5. Partner with SRE, infrastructure, deployment and engineering teams to support internal and customer-facing environments.

Skills

Required

  • Strong backend, infrastructure, or platform engineering experience
  • Deep experience designing and running production workloads on Kubernetes
  • Strong understanding of Kubernetes-native application design, workload orchestration, scaling, reliability and production debugging
  • Experience building platform services, APIs, automation, operators, or controllers using Go or Python
  • Collaborative and adaptable, able to work across engineering, infrastructure, SRE and deployment teams

Nice to have

  • Experience with AI, ML, HPC, training, or inference workloads

What the JD emphasized

  • Kubernetes-native application design
  • workload orchestration
  • scaling
  • reliability
  • production debugging
  • AI, ML, HPC, training, or inference workloads

Other signals

  • Kubernetes-native applications
  • large-scale AI workloads
  • inference, training, CI/CD