Lead/manager Together Cloud Infrastructure

Together AI Together AI · Data AI · Amsterdam, Netherlands · Engineering

Lead/Manager for Together Cloud Infrastructure in Amsterdam, focusing on building and managing a team to develop and operate a global, high-performance cloud platform for AI workloads, including GPU scheduling, management plane, and customer-facing services.

What you'd actually do

  1. Lead/Manage a team of 8 together cloud Infrastructure Engineer in Amsterdam,
  2. Identify, design, and develop foundational backend services that power Together’s commerce platform
  3. Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure
  4. Partner with product teams to understand functional requirements and deliver solutions that meet business needs
  5. Write clear, well-tested, and maintainable software and IaC for both new and existing systems

Skills

Required

  • leading the Infrastructure team
  • acquiring talent and retaining talent
  • building large scale, fault tolerant, distributed systems and API microservices
  • designing, analyzing and improving efficiency, scalability, and stability of various system resources
  • building and operating high-performance and/or globally distributed microservice architectures
  • systems knowledge across compute, networking, and storage
  • developing against and managing a relational database
  • Expert-level programmer in one or more of programming language (Golang preferred)
  • Proficiency in version control practices and integrating IaC with CI/CD pipelines
  • Experience with Kubernetes and containers
  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience

Nice to have

  • Experience building and operating data infrastructure (Kinesis, Airflow, Kafka, etc)

What the JD emphasized

  • building large scale, fault tolerant, distributed systems and API microservices
  • building and operating high-performance and/or globally distributed microservice architectures
  • building and operating data infrastructure

Other signals

  • building the Together cloud platform engineering team
  • virtualizes cutting-edge ML hardware
  • self-serve AI cloud services
  • global management plane for managing our data center compute, networking, and storage
  • customer-facing cloud platform services
  • enterprise AI cloud features