Senior Software Engineer - Storage (c/c++)

Microsoft Microsoft · Big Tech · Hyderabad, TS, IN · Software Engineering

Senior Software Engineer role focused on kernel engineering and enterprise customer reliability within the Windows Storage & File Systems team. Responsibilities include investigating and remediating security vulnerabilities and high-severity reliability issues across NTFS, ReFS, Storage Spaces Direct (S2D), Windows Server Failover Clustering (WSFC), Cluster Shared Volumes (CSV), and the Windows storage driver stack. The role involves deep ownership of source code, cluster state machines, and file system on-disk structures to resolve complex customer escalations.

What you'd actually do

  1. Own end-to-end resolution of critical ICMs escalated from top enterprise customers — analyze memory dumps, ETW traces, Storage Spaces logs, and cluster event logs to root-cause failures in S2D, WSFC, CSV, NTFS, and ReFS that cannot be resolved by field support.
  2. Investigate and fix security vulnerabilities in the Windows storage stack: privilege escalation through NTFS reparse points and junctions, information disclosure via uninitialized kernel pool in file system drivers, and denial-of-service through crafted on-disk structures in ReFS or NTFS.
  3. Design and implement reliability and correctness fixes in kernel-mode storage miniport drivers (StorPort, NVMe, iSCSI, SMB Direct/RDMA) and file system filter drivers — owning the full fix lifecycle from root cause through regression test to servicing release.
  4. Work directly with Storage Spaces Direct (S2D): diagnose and fix rebuild, rebalance, and fault-domain logic errors; investigate cache tier promotion/demotion bugs; resolve pool fragmentation and storage bus layer (SBL) issues in hyper-converged deployments.
  5. Maintain and harden Windows Server Failover Clustering (WSFC) and Cluster Shared Volumes (CSV): resolve quorum edge cases, CSV ownership transfer failures, cluster validation regressions, and inter-node storage arbitration deadlocks.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field AND 8+ years of software engineering with deep expertise in C and C++ for Windows kernel-mode development.
  • Hands-on experience with Windows storage driver stack: StorPort miniport drivers, storage filter drivers, or file system minifilter drivers — understanding of IRP flow, completion routines, and cancel-safe queue management.
  • Solid grounding in Windows kernel fundamentals.
  • Demonstrated ability to perform crash dump analysis and live kernel debugging using WinDbg.
  • Working knowledge of NTFS on-disk structures: MFT record layout, attribute types, USN journal, and the NTFS log file for crash recovery.
  • Familiarity with ReFS (Resilient File System): B+ tree metadata structure, integrity streams, block cloning, and the differences in crash recovery model versus NTFS.
  • Experience debugging file system corruption scenarios: cross-linked clusters, orphaned MFT records, directory entry inconsistencies, and reparse point cycles.
  • Understanding of Windows file system minifilter architecture: altitude registration, pre/post operation callbacks.
  • Hands-on experience with Windows Server Failover Clustering (WSFC): quorum models (Node Majority, Disk Witness, Cloud Witness), cluster network configuration, and the cluster API.
  • Deep understanding of Cluster Shared Volumes (CSV): CSV file system (CSVFS) redirected vs. direct I/O modes, CSV ownership arbitration, and coordination with the Storage Bus Layer.
  • Experience with Storage Spaces Direct (S2D): storage pool creation, virtual disk provisioning, cache tier architecture (NVMe + SSD + HDD), fault domain awareness, and rebuild/rebalance behavior under node and drive failure.
  • Familiarity with storage connectivity protocols in clustered environments: SMB Direct (RDMA), iSCSI multipath (MPIO/DSM), NVMe-oF, and Fibre Channel HBA integration with StorPort.
  • Proven ability to work high-urgency customer escalations (ICMs / CritSits): triage under time pressure, communicate root cause to non-technical stakeholders, and deliver targeted fixes.

Nice to have

  • equivalent experience

What the JD emphasized

  • deep expertise in C and C++ for Windows kernel-mode development
  • deep ownership of source code, cluster state machines, and file system on-disk structures
  • deep understanding of Cluster Shared Volumes (CSV)
  • Deep understanding of Storage Spaces Direct (S2D)
  • Proven ability to work high-urgency customer escalations (ICMs / CritSits)