Principal Open Source Ai/ml Solutions Engineer

AMD · Semiconductors · Bangalore, India · Engineering

This Principal Open Source AI/ML Solutions Engineer role focuses on optimizing AI/ML workloads on AMD's GPU hardware. The engineer will be responsible for the architectural design, development, and implementation of GPU software components, including drivers and APIs, to enhance performance and efficiency for AI applications. The role involves working with AI frameworks, performance analysis, and contributing to open-source software solutions, aiming to unlock Generative AI efficiency.

What you'd actually do

  1. Own architectural design and development of GPU software components, ensuring alignment with industry standards and best practices.
  2. Act as one of the subject matter experts in GPU technologies, providing guidance and mentorship to junior engineers in the team on complex technical challenges.
  3. Design, write, and deliver high-quality open software solutions that enhance GPU performance and capabilities. This includes developing drivers, APIs, and other critical software components.
  4. Conduct research to explore new technologies and methodologies that can improve GPU performance and efficiency. Propose innovative solutions to meet evolving market demands.
  5. Work collaboratively with cross-functional teams, including hardware engineers, system architects, and product managers, to ensure successful integration of GPU technologies into broader systems.

Skills

Required

  • Strong C++ and Python programming skills.
  • Performance analysis skills for both CPU and GPU
  • Good knowledge of AI/ML Frameworks and Architecture
  • Basic GPU kernel programming knowledge
  • Experience with software engineering methodologies such as Agile, Scrum, Kanban.
  • Experience in all the phases of software development, from requirement gathering, analysis, design, development, testing to final release.
  • Experience developing software in an end customer product delivery environment.
  • Experience with open-source software development including collaboration with community maintainers and submitting contributions.
  • Excellent analytical and problem-solving skills.
  • Strong communication skills to effectively convey complex technical concepts to both technical and non-technical stakeholders.
  • Ability to work independently and as part of a team.
  • Willingness to learn skills, tools, and methods to advance the quality, consistency, and timeliness of AMD software products.

Nice to have

  • Experience with GPU kernel programming using CUDA, HIP or OpenCL.
  • Experience in implementing and optimizing parallel methods on GPU accelerators (NCCL/RCCL, OpenMP, MPI)
  • Experience in PyTorch, TensorFlow, JAX.
  • Experience with Singularity, Docker, and/or Kubernetes.

What the JD emphasized

  • open software solutions
  • GPU software components
  • performance analysis
  • AI/ML Frameworks and Architecture
  • GPU kernel programming
  • parallel methods on GPU accelerators

Other signals

  • AI optimization
  • fine-tuning large language models
  • Generative AI efficiency
  • GPU optimization
  • open software solutions