Software Engineer, Data

xAI xAI · AI Frontier · Palo Alto, CA · Engineering

Software Engineer on the Data team at xAI, responsible for developing applications that power data acquisition, preparation, training, quality evaluation, and delivery for model training. Focuses on building a reliable, scalable enterprise data platform to ensure models are trained on high-quality data.

What you'd actually do

  1. Develop a highly reliable and scalable enterprise data platform to orchestrate data acquisition, preparation, training, quality evaluation, and delivery for model training
  2. Create new features such as data lineage, visibility, and monitoring for end-to-end training that improve the quality of the data and model performance
  3. Collaborate with peers on architecture, design, and code reviews
  4. Build prototypes to prove out key design concepts and quantify technical constraints
  5. Own all aspects of software engineering and product development

Skills

Required

  • Bachelor's degree in computer science, data science, engineering, math, physics, or scientific discipline; OR 2+ years of professional experience building software in lieu of a degree
  • 1+ years of experience in application development, software engineering, data engineering, or data science

Nice to have

  • Programming experience in Python, Rust, Java, C#, Scala, Go or similar languages
  • Frontend experience in Angular, React, or similar JavaScript frameworks
  • Hands-on experience with Kubernetes and containerized deployments
  • Experience with Ray, AI training and orchestration
  • Experience with relational and non-relational databases, data lakes e.g. PostgreSQL, Iceberg, Clickhouse, or similar
  • Experience with data exploration tools like Grafana, Superset, or similar
  • Good understanding of version control, testing, continuous integration, build, deployment and monitoring
  • Good understanding of statistics, machine learning algorithms and frameworks

What the JD emphasized

  • high-quality data is fundamental
  • high-quality training data at scale
  • high-quality training data

Other signals

  • building production pipelines and systems that transform raw inputs into high-quality training data at scale
  • develop applications that power data acquisition, preparation, training, quality evaluation, and delivery for model training
  • provide the ability to run training in a reliable, scalable and repeatable manner
  • provide visibility on training status and data lineage
  • work closely with acquisition teams, ML engineers, and data engineers to build a reliable data pipeline to run training at scale