Customer Support Engineer (gpu Cluster), India

Together AI Together AI · Data AI · Remote · Customer Success

Customer Support Engineer for GPU Clusters at Together AI, focusing on resolving technical challenges for customers building training, fine-tuning, and inference solutions. The role involves being a product expert, collaborating with engineering and product teams, and transforming customer insights into product improvements. Requires experience in customer-facing technical roles, AI/ML/GPU technologies, and infrastructure services like Kubernetes.

What you'd actually do

  1. Engage directly with customers to tackle and resolve complex technical challenges involving our cutting-edge Kubernetes GPU clusters; ensure swift and effective solutions every time.
  2. Become a product expert in our GPU Cluster service, serving as the last line of technical defense before issues are escalated to Engineering and Product teams.
  3. Collaborate seamlessly across Engineering, Research, and Product teams to address customer concerns; collaborate with senior leaders both internally and externally to ensure the highest levels of customer satisfaction.
  4. Transform customer insights into action by identifying patterns in support cases and working with Engineering and Go-To-Market teams to drive Together’s roadmap (e.g., future models to support)
  5. Maintain detailed documentation of system configurations, procedures, troubleshooting guides, and FAQs to facilitate knowledge sharing with team and customers.

Skills

Required

  • 3+ years of experience in a customer-facing technical role
  • at least 1 year in a support function in AI or supporting a mission-critical API in SaaS
  • Strong technical background
  • knowledge of AI, ML, GPU technologies and their integration into high-performance computing (HPC) environments
  • Familiarity with infrastructure services (e.g., Kubernetes, SLURM)
  • infrastructure as code solutions (e.g., Ansible)
  • high-performance network fabrics
  • NFS-based storage management
  • container infrastructure
  • scripting and programming languages
  • Foundational understanding in the installation, configuration, administration, troubleshooting, and securing of compute clusters
  • Complex technical problem solving and troubleshooting
  • proactive approach to issue resolution
  • Ability to work cross-functionally with teams such as Sales, Engineering, Support, Product and Research
  • Strong sense of ownership
  • willingness to learn new skills
  • Excellent communication and interpersonal skills
  • ability to explain complex technical concepts to non-technical stakeholders
  • Ability to operate in dynamic environments
  • adept at managing multiple projects
  • comfortable with frequent context switching and prioritization

What the JD emphasized

  • customer-facing technical role
  • AI or supporting a mission-critical API in SaaS
  • AI, ML, GPU technologies
  • Kubernetes
  • complex technical problem solving and troubleshooting