What you'd actually do

Design and implement scalable observability infrastructure for metrics, logging, and tracing.

Build high-performance telemetry pipelines that handle massive ingestion volumes.

Develop APIs, query engines, and UIs that allow engineers to get real-time insights into their services.

Define and enforce best practices for instrumentation, alerting, and reliability across the company.

Partner with infrastructure and product teams to deeply integrate observability into our internal platforms.

Skills

Required

Go, Rust, Scala, or similar languages
Deep understanding of distributed systems and telemetry architecture
Experience building and operating infrastructure at scale
Familiarity with observability stacks such as Prometheus, Grafana, OpenTelemetry, VictoriaMetrics, or ClickHouse
Experience with Kafka, Redis, or large-scale time series databases
Experience operating observability pipelines in Kubernetes or similar orchestration environments

ABOUT xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

ABOUT THE ROLE:

The Observability team builds and operates the core infrastructure that enables engineers to monitor, debug, and optimize the performance and reliability of their systems. We handle telemetry at massive scale — billions of time series and petabytes of logs — with strict performance and availability requirements.

You will be part of the small, high-impact team responsible for building and maintaining X’s observability platform. You’ll own critical systems that power metrics, logs, tracing, and alerting enabling engineering teams to operate services at scale, identify issues before they impact users, and drive systemic reliability improvements.

RESPONSIBILITIES:

Design and implement scalable observability infrastructure for metrics, logging, and tracing.
Build high-performance telemetry pipelines that handle massive ingestion volumes.
Develop APIs, query engines, and UIs that allow engineers to get real-time insights into their services.
Define and enforce best practices for instrumentation, alerting, and reliability across the company.
Partner with infrastructure and product teams to deeply integrate observability into our internal platforms.
Own the reliability, scalability, and performance of the observability stack end-to-end.

BASIC QUALIFICATIONS:

Production-level proficiency in Go, Rust, Scala, or a similar languages
Deep understanding of distributed systems and telemetry architecture.
Experience building and operating infrastructure at scale.
Familiarity with observability stacks such as Prometheus, Grafana, OpenTelemetry, VictoriaMetrics, or ClickHouse.
Experience with Kafka, Redis, or large-scale time series databases.
Experience operating observability pipelines in Kubernetes or similar orchestration environments.

COMPENSATION AND BENEFITS:

$180,000 - $440,000 USD

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

_xAI is an equal opportunity employer. For details on data processing, view our _Recruitment Privacy Notice.

ABOUT xAI

ABOUT THE ROLE:

RESPONSIBILITIES:

Design and implement scalable observability infrastructure for metrics, logging, and tracing.
Build high-performance telemetry pipelines that handle massive ingestion volumes.
Develop APIs, query engines, and UIs that allow engineers to get real-time insights into their services.
Define and enforce best practices for instrumentation, alerting, and reliability across the company.
Partner with infrastructure and product teams to deeply integrate observability into our internal platforms.
Own the reliability, scalability, and performance of the observability stack end-to-end.

BASIC QUALIFICATIONS:

Production-level proficiency in Go, Rust, Scala, or a similar languages
Deep understanding of distributed systems and telemetry architecture.
Experience building and operating infrastructure at scale.
Familiarity with observability stacks such as Prometheus, Grafana, OpenTelemetry, VictoriaMetrics, or ClickHouse.
Experience with Kafka, Redis, or large-scale time series databases.
Experience operating observability pipelines in Kubernetes or similar orchestration environments.

COMPENSATION AND BENEFITS:

$180,000 - $440,000 USD

_xAI is an equal opportunity employer. For details on data processing, view our _Recruitment Privacy Notice.

Member of Technical Staff - Observability

What you'd actually do

Skills

Required

What the JD emphasized

ABOUT xAI

ABOUT THE ROLE:

RESPONSIBILITIES:

BASIC QUALIFICATIONS:

COMPENSATION AND BENEFITS:

ABOUT xAI

ABOUT THE ROLE:

RESPONSIBILITIES:

BASIC QUALIFICATIONS:

COMPENSATION AND BENEFITS: