Solutions Architect - Kubernetes

Weights & Biases Weights & Biases · Data AI · Bellevue, WA +3 · Technology - COR

Solutions Architect focused on Kubernetes and HPC for AI/ML workloads, supporting customers in their use of CoreWeave's cloud infrastructure. The role involves technical customer contact, solution design, proof of concepts, and providing feedback to product teams. Experience with NVIDIA GPUs and AI/ML training/inference is required.

What you'd actually do

  1. Serve as the primary technical point of contact for customers, establishing strong technical relationships and ensuring their success with CoreWeave's cloud infrastructure offerings, focusing on Kubernetes solutions within high-performance compute (HPC) environments
  2. Collaborate closely with customers to understand their unique business needs and create, prototype, and deploy tailored solutions that align with their requirements.
  3. Lead proof of concept initiatives to showcase the value and viability of CoreWeave's solutions within specific environments.
  4. Drive technical leadership and direction during customer meetings, presentations, and workshops, addressing any technical queries or concerns that arise.
  5. Act as a virtual member of CoreWeave's Kubernetes product and engineering teams, identifying opportunities for product enhancement and collaborating with engineers to implement your suggestions.

Skills

Required

  • B.S. in Computer Science or a related technical discipline, or equivalent experience
  • 7+ years of proven experience as a Solutions Architect, engineer, researcher, or technical account manager in cloud infrastructure
  • hands-on experience in designing and implementing cloud solutions
  • Proven track record with building customer relationships
  • communicating clearly
  • ability to break down complex technical concepts to both technical and non-technical audiences
  • Familiar with NVIDIA GPUs typically used in AI/ML applications and associated technologies such as Infiniband and NVIDIA Collective Communications Library (NCCL)
  • Experience with running large-scale Artificial Intelligence/Machine Learning (AI/ML) training and inference workloads on technologies such as Slurm and Kubernetes

Nice to have

  • Code contributions to open-source inference frameworks
  • Experience with scripting and automation related to Kubernetes clusters and workloads
  • Experience with building solutions across multi-cloud environments
  • Client or customer-facing publications/talks on latency, optimization, or advanced model-server architectures

What the JD emphasized

  • focusing on Kubernetes solutions within high-performance compute (HPC) environments
  • building distributed systems or HPC/cloud services, with an expertise focused on scalable Kubernetes solutions
  • running large-scale Artificial Intelligence/Machine Learning (AI/ML) training and inference workloads on technologies such as Slurm and Kubernetes