Senior Software Engineer - Dpu Integrations

Microsoft Microsoft · Big Tech · Santa Clara, CA +1 · Software Engineering

Senior Software Engineer role focused on designing, building, testing, and deploying automation tools and AI agents for DPU (Data Processing Unit) control and data plane software. The role involves technical leadership, debugging field issues, maintaining dashboards, and collaborating cross-functionally to improve product quality and reduce mitigation time for production issues. Experience with AI agents for live site automation and analysis is preferred.

What you'd actually do

  1. Design, build, test and deploy innovative integration and diagnostics tools, and AI agents to release quality products and reduce time to mitigate production issues.
  2. Provides technical leadership to teams to identify the scope of testing to create a quality plan for DPU based compute products. In partnership with key stakeholders creates and manages project schedules.
  3. Leads the team by providing technical expertise and oversight, monitors test plan execution and quality to ensure that testing is efficient and executed according to plans.
  4. Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions,
  5. Alerting stakeholders about status and initiating actions to restore system/product/service for simple and complex problems when appropriate.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field
  • 4+ years technical engineering experience
  • coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python

Nice to have

  • 6+ Years of previous experience in developing, testing, diagnosing and troubleshooting networking, storage or compute cloud platforms as a lead engineer owning releases and mentoring/guiding a team of engineers.
  • Experience with Azure or similar large scale cloud computing infrastructure, control plane, telemetry, monitoring, diagnostics, reporting
  • Experience developing and/or testing embedded software for NICs and/or DPUs/IPUs.
  • Understanding and hands on experience with networking (TCP/IP, RoceV2, routing/switching), Software Defined Networking, and server platform firmware (BMC, BIOS etc) testing.
  • Experience with complex debug/troubleshooting in both lab and live site situations.
  • Experience with dealing with large-scale data analysis to identify themes and root causes of issues
  • Experience with AI agents to do live site tool automation and analysis

What the JD emphasized

  • AI agents

Other signals

  • designing, building, testing and deploying the automation tools and agents needed for control and data plane software that runs on DPU’s custom-built silicon
  • AI agents to do live site tool automation and analysis