Solutions Architect - Kubernetes

Weights & Biases Weights & Biases · Data AI · Singapore · Technology - COR

Solutions Architect focused on Kubernetes and HPC environments for AI/ML workloads, supporting customers in building, deploying, and optimizing their solutions on CoreWeave's cloud infrastructure.

What you'd actually do

  1. Serve as the primary technical point of contact for customers, establishing strong technical relationships and ensuring their success with CoreWeave's cloud infrastructure offerings, focusing on Kubernetes solutions within high-performance compute (HPC) environments
  2. Collaborate closely with customers to understand their unique business needs and create, prototype, and deploy tailored solutions that align with their requirements.
  3. Lead proof of concept initiatives to showcase the value and viability of CoreWeave's solutions within specific environments.
  4. Drive technical leadership and direction during customer meetings, presentations, and workshops, addressing any technical queries or concerns that arise.
  5. Act as a virtual member of CoreWeave's Kubernetes product and engineering teams, identifying opportunities for product enhancement and collaborating with engineers to implement your suggestions.

Skills

Required

  • B.S. in Computer Science or a related technical discipline, or equivalent experience
  • 7+ years of proven experience as a Solutions Architect, engineer, researcher, or technical account manager in cloud infrastructure
  • expertise focused on scalable Kubernetes solutions
  • Fluency in cloud computing concepts, architecture, and technologies with hands-on experience in designing and implementing cloud solutions
  • Proven track record with building customer relationships, communicating clearly and the ability to break down complex technical concepts to both technical and non-technical audiences
  • Familiar with NVIDIA GPUs typically used in AI/ML applications and associated technologies such as Infiniband and NVIDIA Collective Communications Library (NCCL)
  • Experience with running large-scale Artificial Intelligence/Machine Learning (AI/ML) training and inference workloads on technologies such as Slurm and Kubernetes.
  • understand integration points between cloud solutions and enterprise IT

Nice to have

  • Code contributions to open-source inference frameworks
  • Experience with scripting and automation related to Kubernetes clusters and workloads
  • Experience with building solutions across multi-cloud environments
  • Client or customer-facing publications/talks on latency, optimization, or advanced model-server architectures

What the JD emphasized

  • focusing on Kubernetes solutions within high-performance compute (HPC) environments
  • building distributed systems or HPC/cloud services, with an expertise focused on scalable Kubernetes solutions
  • running large-scale Artificial Intelligence/Machine Learning (AI/ML) training and inference workloads