Member of Technical Staff, Infrastructure Data & Analytics - Mai Superintelligence Team

Microsoft Microsoft · Big Tech · United States · Software Engineering

This role focuses on building and owning end-to-end technical vision and execution for infrastructure analytics within Microsoft AI. The primary responsibility is to turn raw telemetry into trusted insights on utilization, capacity, readiness, and efficiency, supporting leadership in making informed investment and planning decisions at scale. The role involves designing and building scalable data pipelines, defining core metrics, architecting self-service dashboards and APIs, and ensuring data quality and governance.

What you'd actually do

  1. Act as the technical lead and owner for infrastructure analytics across compute, storage, and networking.
  2. Design and build durable, scalable data pipelines that ingest telemetry from clusters, schedulers, health systems, and capacity trackers into Data Warehouse
  3. Define and standardize core metrics and semantics (e.g., utilization, occupancy, MFU, goodput, capacity readiness, delivery-to-production).
  4. Architect and maintain self-service dashboards and APIs for fleet, cluster, and squad-level visibility.
  5. Partner closely with stakeholders across Supercomputing Infra, Researchers, Strategy and Executives to ensure metrics reflect operational and business reality.

Skills

Required

  • distributed data processing frameworks
  • large-scale data systems
  • data engineering
  • analytics
  • data science
  • technical ownership

Nice to have

  • ETL orchestration frameworks such as Airflow, Dagster, or similar
  • communication skills
  • explain complex systems clearly to senior leader

What the JD emphasized

  • end-to-end technical vision and execution
  • trusted, decision-quality insights
  • scale
  • startup environment
  • large-scale data systems
  • technical leadership
  • large-scale telemetry systems