Member of Technical Staff - Data Platform

xAI xAI · AI Frontier · Palo Alto, CA · Product

Software engineer on the Data Platform team responsible for building and operating infrastructure for large-scale data transport and processing, including Kafka, HDFS, Spark, Flink, and Trino, to support real-time ML pipelines, feed ranking, experimentation, analytics, and observability at petabyte scale. The role involves designing, building, and operating distributed systems for data movement and compute, focusing on scalability, performance, and reliability.

What you'd actually do

  1. Design and implement high-throughput, low-latency data ingestion and transport systems.
  2. Scale and optimize multi-tenant Kafka infrastructure supporting real-time workloads.
  3. Extend and tune Spark, Flink, and Trino for demanding production pipelines.
  4. Build interfaces, APIs, and pipelines enabling teams to query, process, and move data at petabyte scale.
  5. Debug and optimize distributed systems, with a focus on reliability and performance under load.

Skills

Required

  • distributed systems
  • stream processing
  • large-scale data platforms
  • Rust
  • Go
  • Scala
  • Kafka
  • Flink
  • Spark
  • Trino
  • Hadoop
  • debugging
  • profiling
  • performance optimization

What the JD emphasized

  • shipping and maintaining critical infrastructure
  • minimal guardrails