Senior Cloud Support Engineer

Weights & Biases Weights & Biases · Data AI · Bellevue, WA +3 · Technology - COR

Senior Cloud Support Engineer role focused on supporting AI workloads on a Kubernetes-powered HPC cloud infrastructure. Responsibilities include troubleshooting, mentoring, training, and contributing to product roadmap. Requires experience with Kubernetes, networking, storage, observability, and Linux system administration.

What you'd actually do

  1. Guide and mentor team members in developing their technical skills and troubleshooting capabilities across all disciplines supported by CoreWeave.
  2. Provide real-time feedback and coaching, reviewing tickets to identify opportunities for improvement and ensure quality assurance (QA).
  3. Develop and deliver training sessions to improve the team’s proficiency and efficiency in resolving customer issues.
  4. Use technical expertise to investigate, debug, and resolve customer-impacting issues with the curiosity required to uncover and understand root causes.
  5. Maintain high customer satisfaction through swift, accurate, and empathetic high-touch support communications, as well as established best practices.

Skills

Required

  • Bachelor’s degree in Information Science / Information Technology, Data Science, Computer Science, Engineering, Mathematics, Physics, or a related field, OR equivalent experience in a technical position
  • At least 5+ years of experience in cloud support, systems administration, or related technical support-focused roles
  • Proven hands-on work experience with Kubernetes
  • Experience with networking, load balancing, storage volumes, observability, node management, High-Performance Computing (HPC), and Linux system administration
  • Proven ability to mentor team members, foster technical growth, and improve team-wide capabilities through guidance and feedback
  • Experience with observability tools such as Grafana
  • Strong troubleshooting skills, with experience resolving complex customer issues and driving quality assurance through ticket reviews or similar processes
  • Demonstrated success collaborating with cross-functional teams to refine workflows, implement best practices, and advocate for necessary tools or process changes
  • Excellent written and verbal communication skills, with a track record of simplifying complex concepts for diverse audiences
  • Strong technical presentation skills, with experience delivering precise, engaging, and informative presentations to technical and non-technical audiences, effectively showcasing complex concepts and solutions

Nice to have

  • CKA Certified
  • Demonstrated experience with training, coaching, and creating onboarding materials.
  • Operates in a fast-paced, global, 24/7 support team environment
  • Ability to collaborate across different time zones
  • On-site office environment, hybrid, or remote options depending on location
  • Flexible to travel up to 10% (~25 days/year)

What the JD emphasized

  • 24/7/365 team
  • mission-critical applications
  • cutting-edge AI training workloads
  • high-priority escalations
  • complex customer challenges
  • complex customer issues
  • fast-paced, global, 24/7 support team environment