Data Engineer

xAI xAI · AI Frontier · Palo Alto, CA · Engineering

xAI is seeking a Data Engineer to build and maintain production-grade data pipelines, tooling, and software systems for data acquisition, preparation, quality evaluation, and delivery for model training. The role involves analyzing data performance, investigating anomalies, and researching methods to improve data quality, with a focus on enabling effective and reliable model training.

What you'd actually do

  1. Analyze the performance and impact of data used throughout the model training lifecycle
  2. Investigate anomalous model behavior and rigorously identify the data issues that drive poor downstream performance
  3. Design, build, and improve the data cleaning, transformation, and quality-control steps required to produce high-quality training data
  4. Research, evaluate, and develop frontier methods for improving data quality and effectiveness in AI model development
  5. Apply statistical techniques and empirical analysis to make informed, data-driven decisions about dataset quality and model outcomes

Skills

Required

  • Python
  • data pipelines
  • data quality
  • model training
  • statistics
  • neural networks

Nice to have

  • analytics
  • data science
  • machine learning
  • large-scale machine learning workloads
  • Parquet
  • Kubernetes
  • distributed production environments
  • predictive models
  • machine learning pipelines
  • clustering
  • forecasting
  • anomaly detection
  • terabyte- to petabyte-scale data systems
  • scaling ladder design studies

What the JD emphasized

  • production code
  • production pipelines
  • production-grade data pipelines
  • production environments

Other signals

  • data pipelines
  • data quality
  • model training