Senior Software Engineer - Together Cloud Platform

Together AI Together AI · Data AI · San Francisco, CA · Engineering

Senior Backend Engineer role focused on building and scaling the AI Acceleration Cloud platform, which virtualizes ML hardware and provides self-serve AI cloud services for ML practitioners. Responsibilities include developing distributed GPU scheduling systems, global management planes, and customer-facing cloud platform services, ensuring high availability and performance.

What you'd actually do

  1. Identify, design, and develop foundational backend services that power Together’s cloud platform
  2. Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure
  3. Partner with product teams to understand functional requirements and deliver solutions that meet business needs
  4. Write clear, well-tested, and maintainable software and IaC for both new and existing systems
  5. Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance

Skills

Required

  • building large scale, fault tolerant, distributed systems and API microservices
  • designing, analyzing and improving efficiency, scalability, and stability of various system resources
  • building and operating high-performance and/or globally distributed microservice architectures
  • systems knowledge across compute, networking, and storage
  • developing against and managing a relational database
  • Golang
  • version control practices
  • integrating IaC with CI/CD pipelines

Nice to have

  • Kubernetes and containers
  • building and operating data infrastructure (Kinesis, Airflow, Kafka, etc)

What the JD emphasized

  • building large scale, fault tolerant, distributed systems and API microservices
  • building and operating high-performance and/or globally distributed microservice architectures