Data AI · ML experiment tracking
Weights & Biases currently has 32 active AI-related job listings. The majority of these roles, 59%, are focused on serving infrastructure, with an additional 31% dedicated to agents. Engineering is the most frequent function, with 29 positions. The company is actively hiring for roles involving inference infrastructure, model serving, and agent orchestration. Over the last 30 days, Weights & Biases has added 8 new AI roles, representing a 300% increase compared to the previous 30-day period.
Currently tracking 19 active AI roles, down 22% versus the prior 4 weeks. Primary focus: Serve · Engineering. Salary range $92k–$341k (avg $209k).
Weights & Biases currently has 35 active AI-related roles in our index. The most common open titles are: Account Solution Architect (3), Solutions Architect - HPC/AI/ML (2), Enterprise GTM Leader, Principal Engineer - Perf and Benchmarking, Principal Engineer, Cluster Orchestration. Most positions are in Engineering and Product.
Weights & Biases's active AI hiring is concentrated in: serving infrastructure (60%), agents (31%), application (3%). These categories follow a seven-stage AI lifecycle: data, pre-training, post-training, serving infrastructure, agents, evaluation, and application.
Weights & Biases is hiring AI talent in: United States (33 roles), United Kingdom (1 role), Singapore (1 role).
Job postings at Weights & Biases most frequently reference: model serving, inference infra, llm observability, agent orchestration, evals.
In the past 30 days, Weights & Biases has posted 5 new AI-related roles.
| Title | Stage | AI score |
|---|---|---|
| Staff AI Security Engineer Staff AI Security Engineer to define and operationalize security across CoreWeave's AI ecosystem, focusing on secure-by-default foundations for AI development, agentic workflows, and enterprise AI adoption. The role involves building secure infrastructure, developing AI security policies, implementing guardrails for agentic systems, leading secure adoption of AI tools, and conducting adversarial testing. | AgentServe | 8 |
| Principal Engineer - Perf and Benchmarking Principal Engineer role focused on leading the Benchmarking & Performance team at CoreWeave, a cloud provider for AI. The role involves defining strategy, leading end-to-end MLPerf submissions (Training & Inference), designing and implementing a Kubernetes-native benchmarking service for latency and throughput, and building CI/CD pipelines for scale. It requires deep expertise in distributed systems, GPU performance, model-serving stacks, and Kubernetes, with a focus on achieving industry-leading performance data and publications. |
| ServeEval Gate |
| 8 |
| Account Solution Architect Account Solutions Architect for financial services customers, focusing on scaling AI/ML workloads on CoreWeave's cloud platform. Responsibilities include technical partnership, platform adoption, identifying expansion opportunities, and advising on model development, deployment, and infrastructure efficiency for AI/ML-intensive organizations. | AgentServe | 7 |
| Account Solution Architect Account Solutions Architect for financial services customers, focusing on deepening platform adoption for AI/ML workloads, identifying expansion opportunities, and serving as a trusted advisor for production AI scaling. Requires hands-on experience with training, fine-tuning, evaluating, and deploying deep learning models and LLM-powered applications. | AgentServe | 7 |
| Staff Software Engineer, Inference Staff Software Engineer on the Inference Platform Team at CoreWeave, focusing on building and operating a Kubernetes-native inference platform for AI workloads. The role involves technical leadership in architecture, performance optimization (latency, throughput, GPU utilization), and system reliability for low-latency, high-throughput systems at massive scale, with deep work in distributed systems and Kubernetes infrastructure. | Serve | 7 |
| Staff Technical Program Manager - Cluster Orchestration & Applied Training Staff Technical Program Manager to lead cross-functional programs for AI/ML Platform Services, focusing on Cluster Orchestration (scheduling, launching, managing AI workloads) and Applied Training (enabling researchers to use infrastructure for pre-training, fine-tuning, RL, evaluations). The role involves partnering with engineering, product, and research teams to improve workload execution and user interaction with training platforms, driving delivery across various AI training workflows and ensuring successful launches and operational ownership. | ServePost-train | 7 |
| Principal Engineer, Cluster Orchestration CoreWeave is seeking a Principal Engineer to lead the design and evolution of their AI infrastructure's cluster orchestration systems, including Slurm, Kubernetes, and SUNK. This role involves defining long-term architecture, solving scaling problems, and ensuring the reliability and efficiency of GPU resource utilization for AI training and inference workloads. | Serve | 7 |
| Senior Software Engineer, Observability Insights Senior Software Engineer to lead development of agentic interfaces and product experiences for AI system observability, focusing on multi-tenant APIs, Grafana, and tool servers. Requires experience in backend systems, distributed APIs, reliability engineering, and agentic applications/LLM features. | AgentServe | 7 |
| Senior Software Engineer II, Applied Training Senior Software Engineer II, Applied Training at CoreWeave, focusing on building and scaling Kubernetes-native research cluster platforms and sandbox client infrastructure for agentic training and evaluation. The role aims to provide AI labs with advanced research infrastructure, enabling them to focus on model training rather than operations. Responsibilities include contributing to the roadmap, designing cluster experiences, owning SDKs for agent rollouts and benchmarks, writing documentation, and working closely with large AI labs. | ServeAgent | 7 |
| Staff Software Engineer, Applied Training CoreWeave is seeking a Staff Software Engineer to join their Applied Training team. This role will focus on building and improving their Kubernetes-native research cluster platform and sandbox client for agentic training and evaluation. The goal is to provide AI researchers with the infrastructure needed to train models efficiently, abstracting away operational complexities. Responsibilities include contributing to the roadmap, designing and building cluster experiences, owning the Python SDK for agentic workflows, and documenting training frameworks. The ideal candidate has extensive experience in distributed systems, ML infrastructure, or developer platforms, with strong Kubernetes expertise and familiarity with AI training and agentic workflows. | ServeAgent | 7 |
| Senior Software Engineer I, Inference CoreWeave is seeking a Senior Software Engineer to own and improve their Kubernetes-native inference platform, focusing on latency, throughput, and reliability. The role involves leading design, implementing optimizations, strengthening incident posture, and mentoring junior engineers. Requires experience with distributed systems, Kubernetes, and inference internals. | Serve | 7 |
| Sr. Software Engineer - Perf and Benchmarking Senior Software Engineer focused on performance and benchmarking of AI infrastructure, including Kubernetes-native services, MLPerf runs, and model-serving stacks. The role involves building and improving services to measure latency, throughput, and cost, and ensuring reproducible benchmarking processes. | ServeEval Gate | 7 |
| Software Engineer, Inference AI/ML Software Engineer focused on improving the latency, reliability, and cost of model serving on a GPU platform, working with services like Triton, vLLM, and TensorRT-LLM. | Serve | 7 |
| Senior Software Engineer II, Inference Senior Software Engineer II focused on owning and optimizing CoreWeave's Kubernetes-native inference platform to meet strict P99 SLAs at scale. Responsibilities include leading design reviews, implementing advanced optimizations for latency and throughput, strengthening incident posture, and mentoring junior engineers. Requires strong experience in distributed systems, Python/Go, networked systems performance, Kubernetes, and ML inference internals. | Serve | 7 |
| Solutions Architect - HPC/AI/ML Solutions Architect role focused on AI/ML inference workloads on high-performance compute (HPC) infrastructure, primarily using Kubernetes and NVIDIA GPUs. The role involves customer technical contact, solution design, proof of concept, workload optimization, and providing feedback to product teams. | Serve | 7 |
| Senior Systems Engineer, OS Automation Senior Systems Engineer focused on automating and scaling Linux OS and Kernel build pipelines, with a strong emphasis on integrating AI/ML technologies like LLMs, RAG, and predictive modeling to create AI-native infrastructure, smart CI/CD, auto-remediation, and predictive regression detection. | ServeAgent | 7 |