Member of Technical Staff - Data Platform

Microsoft Microsoft · Big Tech · Mountain View, CA +2 · Data Engineering

This role focuses on building and architecting distributed data platforms and pipelines that process massive datasets for AI models, including training, inference, and evaluation. It involves designing event-driven architectures, handling unstructured data, engineering feedback loops for AI, and optimizing compute for cost and performance.

What you'd actually do

  1. Core Platform Engineering: Design and build the underlying frameworks (based on Spark/Databricks) that allow internal teams to process massive datasets efficiently, abstracting away the complexity of "ETL" into self-service infrastructure.
  2. Distributed Systems Architecture: Modernize our data stack by moving from batch-heavy patterns to event-driven architectures, utilizing modern streaming architecture to reduce latency for AI inference.
  3. Unstructured AI Data Pipelines: Architect high-throughput pipelines capable of processing complex, non-tabular data (documents, code repositories, chat logs) for LLM pre-training, fine-tuning and evaluations datasets.
  4. AI Feedback Loops: Engineer the high-throughput telemetry systems that capture user interactions with Copilot, creating the critical data loops required for Reinforcement Learning and model evaluation.
  5. Infrastructure as Code: Treat the data platform as software. Define and deploy all storage, compute, and networking resources using IaC (Bicep/Terraform) rather than manual configuration.

Skills

Required

  • Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 3+ years experience in business analytics, data science, software development, data modeling, or data engineering OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering OR equivalent experience.

Nice to have

  • Bachelor's or Master's Degree in Computer Science, Software Engineering, or related technical field.
  • 4+ years of experience in Software Engineering or Data Infrastructure.
  • Proficiency in Python, Scala, Java, or Go. You write production-grade application code with unit tests, CI/CD, and modular design.
  • Deep Distributed Systems Knowledge: Demonstrated technical understanding of massive-scale compute engines (e.g., Apache Spark, Flink, Ray, Trino, or Snowflake). You should understand internals like query planning, memory management, and distributed consistency.
  • Experience architecting Lakehouse environments at scale (using Delta Lake, Iceberg, or Hudi).
  • Experience building internal developer platforms or "Data-as-a-Service" APIs.
  • Strong background in streaming technologies (Kafka, Azure EventHubs, Pulsar) and stateful stream processing.
  • Experience with container orchestration (Kubernetes) for deploying data applications.
  • Experience enabling AI/ML workloads (Feature Stores, Vector Databases).

What the JD emphasized

  • Systems Builders
  • architect the backbone of Microsoft Copilot
  • build the "Paved Road" for AI
  • processing petabytes of data for the world's most advanced AI models
  • architect high-throughput pipelines capable of processing complex, non-tabular data (documents, code repositories, chat logs) for LLM pre-training, fine-tuning and evaluations datasets
  • Engineer the high-throughput telemetry systems that capture user interactions with Copilot, creating the critical data loops required for Reinforcement Learning and model evaluation

Other signals

  • processing petabytes of data for the world's most advanced AI models
  • architect the backbone of Microsoft Copilot
  • build the "Paved Road" for AI
  • transforms raw, massive-scale signals into the fuel that powers training, inference, and evaluation for millions of users
  • architect high-throughput pipelines capable of processing complex, non-tabular data (documents, code repositories, chat logs) for LLM pre-training, fine-tuning and evaluations datasets
  • Engineer the high-throughput telemetry systems that capture user interactions with Copilot, creating the critical data loops required for Reinforcement Learning and model evaluation