Senior Software Engineer, Observability Insights

Weights & Biases Weights & Biases · Data AI · New York, NY +1 · Technology

Senior Software Engineer to lead development of agentic interfaces and product experiences for AI system observability, focusing on multi-tenant APIs, Grafana, and tool servers. Requires experience in backend systems, distributed APIs, reliability engineering, and agentic applications/LLM features.

What you'd actually do

  1. lead the development of agentic interfaces and product experiences that sit atop CoreWeave’s telemetry layer
  2. design multi-tenant APIs, managed Grafana experiences, and MCP-based tool servers
  3. shape the end-to-end observability experience and influence how people engage with cutting-edge AI infrastructure

Skills

Required

  • 6+ years of experience in software or infrastructure engineering
  • building production-grade backend systems and distributed APIs
  • developer-facing infrastructure
  • customer-obsessed approach to SDKs, CLIs, and APIs
  • reliability engineering
  • fault-tolerant design
  • SLOs
  • error budgets
  • multi-tenant system resilience
  • agentic applications or LLM-based features
  • grounding
  • tool calling
  • operational safety
  • Go
  • Python
  • agile teams
  • end-to-end telemetry-to-insights pipelines

Nice to have

  • operating Kubernetes clusters at scale, especially for AI workloads
  • logging, tracing, and metrics platforms in production
  • cardinality, indexing, and query optimization
  • running distributed systems or API services at cloud scale
  • event streaming
  • data pipeline management
  • LLM frameworks
  • MCP
  • agentic tooling (e.g., Langchain, AgentCore)

What the JD emphasized

  • agentic interfaces
  • LLM-based features
  • tool calling
  • operational safety
  • agentic applications

Other signals

  • agentic interfaces
  • LLM-based features
  • observability for AI systems