Senior System Debug Engineer

Intel Intel · Semiconductors · Bangalore, India

This role focuses on debugging and resolving complex system-level issues for Intel's AI GPU product roadmap, involving hardware, software, firmware, and silicon components. The engineer will lead root-cause analysis, manage the bug lifecycle, and collaborate with cross-functional teams to ensure efficient resolution of platform issues and customer escalations. While familiarity with ML frameworks and AI/ML deployment debugging is a plus, the core responsibility is system-level engineering and debugging.

What you'd actually do

  1. Drive the resolution of multidisciplinary platform issues by coordinating closely with all ingredient partners.
  2. Lead thorough root-cause analysis, issue isolation, and disposition activities for all Platform issues, ensuring timely and accurate closure.
  3. Establish SLAs for the bug lifecycle, obtain alignment from all ingredient owners, and proactively monitor and address any deviations.
  4. Take complete ownership of the end-to-end bug lifecycle, ensuring full transparency, accountability, and effective communication.
  5. Improve overall program execution efficiency by identifying automation opportunities and implementing targeted automation solutions.

Skills

Required

  • Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
  • 8+ years of industry experience with a Bachelor's degree, 7+ years with a Master's degree, or 6+ years with a PhD
  • Linux Kernel Debugging Expertise
  • Linux internals
  • Linux system and user space debugging
  • System-level debug skills
  • Hardware-software interaction issues
  • RAS
  • Power Management (PM)
  • PCIe
  • Performance
  • Security
  • Ethernet
  • HBM
  • GPU subsystems
  • logs
  • traces
  • instrumentation
  • debug tools
  • Python
  • C
  • C++
  • Self-Learning
  • Communication Skills
  • GPU architecture
  • GPU memory hierarchy
  • GPU performance bottlenecks
  • GPU debug methodologies

Nice to have

  • Machine Learning Framework Experience
  • PyTorch
  • TensorFlow
  • AI/ML Deployment and Debugging
  • Platform Architecture Understanding
  • Intel platform architectures
  • ARM platform architectures
  • related debug tools and frameworks

What the JD emphasized

  • Linux Kernel Debugging Expertise
  • Strong System-Level Debug Skills
  • Deep Technical Knowledge Across Key Domains
  • GPU Architecture and Debug Expertise