Senior System Debug Engineer

Intel Intel · Semiconductors · Bangalore, India

Senior System Debug Engineer responsible for the design and development of integrated AI solutions for deep learning and machine learning systems, focusing on hardware, software, firmware, board, and silicon components. The role involves AI systems architecture, defining product specifications, and impacting the AI product roadmap. It requires developing new methods in various AI/ML domains, leading design and implementation of component-level choices for performance and cost, defining system integration approaches, and delivering end-to-end technical solutions. The role also includes debugging and ensuring the reliability of AI infrastructure, collaborating on next-generation requirements, and influencing AI roadmap with customer knowledge.

What you'd actually do

  1. Responsible for the overall design and development of integrated Artificial Intelligence (AI) solutions for deep learning and machine learning systems that integrate hardware, software, firmware, board, and silicon components with specific focus on customer requirements and implementation limitations throughout the systems lifecycle.
  2. May also be responsible for AI systems architecture and definition, including translating the business opportunity into use cases and developing product specifications for required hardware and software needed to deliver system requirements.
  3. Impacts and influences the AI product roadmap and development based on profound comprehension of AI and deep learning algorithms, deep learning customer requirements, and deep learning software frameworks.
  4. Develops new methods in the areas of reinforcement learning, policy learning, computer vision, machine learning, simulation, sim2real, autonomous driving, and robotics.
  5. Leads design, analysis, and implementation of component-level choices across the integrated AI systems on performance, features, and cost, including analysis of risks and emphasis on ease of use, reliability, security, availability, maintainability, sustainability, and quality.

Skills

Required

  • Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
  • 10+ years of industry experience with a Bachelor's degree, 9+ years with a Master's degree, or 8+ years with a PhD.
  • Linux Kernel Debugging Expertise
  • System-Level Debug Skills
  • Deep Technical Knowledge Across Key Domains: RAS, Power Management (PM), PCIe, Performance, Security, Ethernet, HBM, and GPU subsystems
  • Programming Skills: Python, C, and C++
  • Excellent Self-Learning and Communication Skills
  • GPU Architecture and Debug Expertise

Nice to have

  • Machine Learning Framework Experience (Good to Have): Familiarity with frameworks such as PyTorch and TensorFlow
  • AI/ML Deployment and Debugging (Good to Have)
  • Platform Architecture Understanding (Good to Have): Knowledge of Intel and ARM platform architectures

What the JD emphasized

  • Linux Kernel Debugging Expertise
  • Deep Technical Knowledge Across Key Domains
  • AI/ML Deployment and Debugging

Other signals

  • Develops new methods in the areas of reinforcement learning, policy learning, computer vision, machine learning, simulation, sim2real, autonomous driving, and robotics.
  • Leads design, analysis, and implementation of component-level choices across the integrated AI systems on performance, features, and cost
  • Defines systems implementation and integration approach and plans to ensure optimum performance and reliability across hardware and software that comprise the system.
  • Delivers endtoend technical solutions to solve customer problems, deploying solutions, executing benchmark tests, and preparing documentation.
  • Conducts analysis and makes reliable engineering recommendations to ensure reliability/resiliency of the AI infrastructure.