What you'd actually do

Design and implement real-time streaming ETL / feature pipelines (e.g., Flink or Spark Structured Streaming) that meet strict freshness and correctness constraints.

Build and operate reliable messaging and ingestion with Kafka/Pulsar (partitioning strategy, retries, ordering guarantees, DLQs, backpressure handling).

Own data contracts between producers, pipelines, and consumers: schema evolution, versioning, compatibility, validation, and safe rollout.

Implement production-grade backfill/replay workflows

Define and meet SLOs using OpenTelemetry/Prometheus/Grafana for metrics, tracing, dashboards, alerting, and incident response readiness.

Skills

Required

Bachelor’s or Master’s degree in Computer Science, Electrical/Computer Engineering, or a related field, with 8+ years of related experience.
Strong programming skills in language C++,C# or Python (at least one required).
Building and operating streaming data pipelines in production (Flink or Spark Structured Streaming)
Distributed systems engineering with strong reliability and operational rigor
Messaging systems such as Kafka/Pulsar.
Operating services with Kubernetes/containers and production readiness practices (deployments, scaling, rollbacks).
Observability stacks such as OpenTelemetry, Prometheus, Grafana.
Ability to debug complex production issues using logs/metrics/traces and performance profiling.
Strong communication and collaboration skills

Nice to have

Experience with feature stores, embedding pipelines, and online/offline consistency (freshness guarantees, correctness validation).
Experience with data lakehouse/table formats and optimizations eg partitioning, compaction, and incremental processing.
Experience with GPU inference serving (Triton, ONNX Runtime/TensorRT) and performance techniques (batching, request shaping, tail-latency reduction).
understanding of pipeline correctness patterns: idempotency, dedup, watermarking, late data, exactly-once vs at-least-once tradeoffs.
Background in cost/performance modeling, capacity planning, and reliability improvements for high-scale data platforms.
Experience in Ads/search/recommendations or other high-scale systems where freshness, latency, and cost are jointly optimized.

What the JD emphasized

real-time data

low-latency serving

ML models

massive scale

strict freshness, cost, and reliability requirements

real-time data pipelines

feature/embedding materialization systems

ML inference serving

robust streaming + ETL systems

owning SLOs

strong observability

operational maturity

optimizing end-to-end performance and cost

freshness, correctness, latency, reliability, and cost in production

strict freshness and correctness constraints

reliable messaging and ingestion

production-grade backfill/replay workflows

Define and meet SLOs

production readiness practices

debug complex production issues

feature stores, embedding pipelines, and online/offline consistency

Overview

Modern ads platforms run on always-on, real-time data: streaming events, feature computation, near-real-time aggregations, and low-latency serving to power ML models that operate at massive scale under strict freshness, cost, and reliability requirements.Microsoft Ads builds and operates large-scale, latency-sensitive systems that serve billions of requests. We are looking for a Principal Software Engineer who is hands-on with production coding and system design to build the real-time data pipelines and feature/embedding materialization systems that feed online stores/caches and integrate tightly with ML inference serving.

This role is ideal for engineers who enjoy:

building robust streaming + ETL systems (correctness, idempotency, backfills, late data),
owning SLOs with strong observability and operational maturity,and
optimizing end-to-end performance and cost across compute, storage, and serving integrations.Primary success metrics are freshness, correctness, latency, reliability, and cost in production.

Responsibilities

Responsibilities

Design and implement real-time streaming ETL / feature pipelines (e.g., Flink or Spark Structured Streaming) that meet strict freshness and correctness constraints.
Build and operate reliable messaging and ingestion with Kafka/Pulsar (partitioning strategy, retries, ordering guarantees, DLQs, backpressure handling).
Own data contracts between producers, pipelines, and consumers: schema evolution, versioning, compatibility, validation, and safe rollout.
Implement production-grade backfill/replay workflows
Define and meet SLOs using OpenTelemetry/Prometheus/Grafana for metrics, tracing, dashboards, alerting, and incident response readiness.
Integrate pipelines with online stores/caches and ML consumers (feature stores, embedding pipelines, LLM API calls, online/offline consistency patterns).
Partner with applied scientists on feature/embedding definitions, validation, and end-to-end quality measurement.
Optimize end-to-end performance and efficiency: CPU/memory/I/O, serialization, caching, network overhead, concurrency, and pipeline compute cost.
Contribute to serving/inference integrations where needed (e.g., Triton/ONNX Runtime/TensorRT) including batching and latency/cost tradeoffs.
Ship safely with CI/CD, automated testing (unit/integration/data quality), and operational playbooks/runbooks.

Qualifications

Qualifications****Required Qualifications

Bachelor’s or Master’s degree in Computer Science, Electrical/Computer Engineering, or a related field, with 8+ years of related experience.
Strong programming skills in language C++,C# or Python (at least one required).
Hands-on experience in one or more:
- Building and operating streaming data pipelines in production (Flink or Spark Structured Streaming),
- Distributed systems engineering with strong reliability and operational rigor,
- Messaging systems such as Kafka/Pulsar.
Experience operating services with Kubernetes/containers and production readiness practices (deployments, scaling, rollbacks).
Experience with observability stacks such as OpenTelemetry, Prometheus, Grafana.
Ability to debug complex production issues using logs/metrics/traces and performance profiling.
Strong communication and collaboration skills, with experience working across engineering, applied science/ML, and product/business stakeholders.

Preferred Qualifications

Experience with feature stores, embedding pipelines, and online/offline consistency (freshness guarantees, correctness validation).
Experience with data lakehouse/table formats and optimizations eg partitioning, compaction, and incremental processing.
Experience with GPU inference serving (Triton, ONNX Runtime/TensorRT) and performance techniques (batching, request shaping, tail-latency reduction).
understanding of pipeline correctness patterns: idempotency, dedup, watermarking, late data, exactly-once vs at-least-once tradeoffs.
Background in cost/performance modeling, capacity planning, and reliability improvements for high-scale data platforms.
Experience in Ads/search/recommendations or other high-scale systems where freshness, latency, and cost are jointly optimized.

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about **requesting accommodations.**