Software Engineering II

Microsoft Microsoft · Big Tech · Bengaluru, KA, IN · Software Engineering

The Azure Data engineering team is building the data platform for the age of AI, powering a new class of data-first applications. The Cosmos Analytics Platform team is hiring a Senior Software Engineer to drive the evolution, reliability, and performance of Microsoft's hyperscale big data platform—Cosmos. This team builds and operates foundational infrastructure that powers mission-critical data analytics workloads across Microsoft. The team works on core platform components, complex distributed systems problems, modernizing the compute platforms, live-site excellence, and developer experience improvements. The role will evolve core platform capabilities for performance sensitive and ML/AI heavy workloads, including large scale shuffle data management, ARM based compute, GPU accelerated execution paths and secure containerization.

What you'd actually do

  1. Design and evolve core execution, scheduling, and resource management systems that power Cosmos Analytics at hyperscale, ensuring high performance, predictability, and operational excellence.
  2. Evolve core platform capabilities for performance sensitive and ML/AI heavy workloads. Includes large scale shuffle data management, ARM based compute, GPU accelerated execution paths and secure containerization.
  3. Collaborate across Azure services (Fabric, Storage, ACI, and Capacity teams) to land cross service features, remove architectural bottlenecks, and ensure platform readiness for large scale customer scenarios.
  4. Deliver reliability gains by improving system robustness, refining failover flows, and lowering incident frequency and mitigation times.
  5. Contribute to engineering quality through diagnostics tooling and automated checks.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field AND 3+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements

Nice to have

  • Experience designing scalable, reliable, secure services and debugging complex, multi component production issues.
  • Familiarity with cloud environments (e.g., Azure) and service deployment/operations.
  • Hands on with big data execution engines (Spark, SCOPE) and cluster orchestration.
  • Experience with shuffle systems and data movement pipelines (concepts like partitioning, spill/merge, locality).
  • Practical exposure to containerization (Docker/OCI), orchestration (Kubernetes/Service Fabric), and image/build pipelines.
  • Background in ARM compute and/or GPU acceleration; performance tuning on heterogeneous hardware.
  • Proven cross team collaboration, ability to drive clarity in ambiguous spaces, and excellent technical communication.

What the JD emphasized

  • ML/AI heavy workloads
  • hyperscale
  • performance sensitive