Solution Specialist, AI Runtime Services

Weights & Biases Weights & Biases · Data AI · Bellevue, WA +4 · Global Field Organization

This role focuses on bringing new AI runtime services, such as model serving and sandboxes, to market. It involves driving initial customer adoption, gathering feedback for the product roadmap, and enabling sales teams to position these services. The role requires deep expertise in AI runtime infrastructure, including serving frameworks, inference optimization, and execution isolation.

What you'd actually do

  1. Own the commercial and technical strategy for net new customer wins in AI runtime infrastructure, where execution performance, deployment flexibility, and operational reliability are the primary buying triggers.
  2. Drive new business opportunities where inference latency, throughput bottlenecks, workload isolation requirements, or operational complexity are barriers to scaling AI on CoreWeave.
  3. Develop deep expertise across the AI runtime landscape (model serving architectures, execution scheduling, containerized AI workloads, and secure multi-tenant compute), using CoreWeave's Inference and Sandboxes products as flagship examples of what best-in-class runtime looks like.
  4. Translate customer requirements around serving frameworks (e.g., vLLM, TensorRT-LLM, TGI), batching strategies, and execution isolation into specific product feedback that shapes the AI Runtime Services roadmap.
  5. Develop deal structures, technical playbooks, and benchmark narratives that help sales and SA teams accelerate runtime-sensitive opportunities across the full spectrum of AI deployment patterns.

Skills

Required

  • 10+ years of experience in distributed systems, ML infrastructure, or production AI engineering
  • 5+ years working with AI runtime systems (model serving, inference optimization, containerized workload execution, or real-time ML pipelines) in a customer-facing or deal-shaping capacity
  • Deep working knowledge of how AI workloads execute at runtime: serving frameworks, batching strategies, GPU memory management, and the performance levers that determine throughput and latency at scale
  • Experience with sandboxed and isolated execution environments (microVM architectures, container runtimes, secure multi-tenant scheduling)
  • Familiarity with Kubernetes-native runtime orchestration (autoscaling, scheduling policies, GPU operators)
  • Ability to benchmark, explain, and commercially position runtime performance differences

Nice to have

  • Experience driving new business or shaping product strategy in industries with high-throughput AI runtime demands, such as generative AI applications, autonomous systems, financial modeling, or developer platforms.
  • Prior background in technical sales, solution consulting, or product management supporting large-scale inference infrastructure or AI platform decisions.
  • Deep understanding of cost-per-token economics, inference fleet optimization, and the commercial tradeoffs between on-demand, reserved, and spot GPU capacity for runtime workloads.
  • Advanced degree in Computer Science, Machine Learning, or Engineering, or equivalent experience with a demonstrated ability to operate at the intersection of technical architecture and commercial strategy.

What the JD emphasized

  • AI runtime infrastructure
  • execution performance
  • deployment flexibility
  • operational reliability
  • inference latency
  • throughput bottlenecks
  • workload isolation requirements
  • operational complexity
  • model serving architectures
  • execution scheduling
  • containerized AI workloads
  • secure multi-tenant compute
  • serving frameworks
  • batching strategies
  • execution isolation
  • runtime performance tradeoffs
  • cost-per-token economics
  • architectural decisions
  • throughput modeling
  • GPU utilization commitments
  • SLA structures
  • serving efficiency
  • execution isolation
  • operational reliability
  • distributed systems
  • ML infrastructure
  • production AI engineering
  • customer outcomes
  • revenue
  • AI runtime systems
  • model serving
  • inference optimization
  • containerized workload execution
  • real-time ML pipelines
  • serving frameworks
  • batching strategies
  • GPU memory management
  • throughput
  • latency at scale
  • sandboxed and isolated execution environments
  • microVM architectures
  • container runtimes
  • secure multi-tenant scheduling
  • execution isolation requirements
  • GPU memory hierarchies
  • model parallelism strategies
  • runtime architecture decisions
  • cost
  • latency
  • scalability outcomes
  • Kubernetes-native runtime orchestration
  • autoscaling
  • scheduling policies
  • GPU operators
  • workload portability
  • operational complexity
  • platform stickiness
  • benchmark
  • commercially position runtime performance differences
  • deployment patterns
  • instance types
  • serving configurations
  • high-throughput AI runtime demands
  • generative AI applications
  • autonomous systems
  • financial modeling
  • developer platforms
  • technical sales
  • solution consulting
  • product management
  • large-scale inference infrastructure
  • AI platform decisions
  • cost-per-token economics
  • inference fleet optimization
  • GPU capacity
  • runtime workloads

Other signals

  • AI runtime services
  • model serving
  • inference platform
  • customer adoption
  • product roadmap