Middleware Development Engineer

Intel Intel · Semiconductors · Oregon, Hillsboro, United States +2

This role focuses on optimizing communication libraries (oneCCL, SHMEM, MPI) for High-Performance Computing (HPC) and Artificial Intelligence (AI) workloads. The engineer will identify and resolve performance bottlenecks within Intel's communication libraries, particularly for AI applications running at scale on Intel's GPUs and CPUs. The goal is to maximize the utility and performance of these systems for scientific discoveries and machine learning innovations.

What you'd actually do

  1. Identify performance bottlenecks and additional features necessary to run Argonne AI COE workloads.
  2. Optimize runtime software for distributed computing systems, ensuring optimal latency and bandwidth.
  3. Collaborate with cross-functional teams to define technical specifications and software requirements.
  4. Troubleshoot and resolve complex issues across multiple hardware and software stack layers.
  5. Contribute to software innovations that enhance HPC and AI capabilities at unprecedented scale.

Skills

Required

  • Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, Mathematics, or STEM-related field with 5+ yrs. of experience in software development OR Master's degree with 3+ yrs. of experience OR Ph.D. with 3+ months of experience
  • 3+ years of experience in distributed computing systems, HPC communication libraries (MPI, SHMEM, or oneCCL/NCCL), GPU software development, or network communication stack development.

Nice to have

  • Advanced degree (Master's or PhD)
  • Proficiency in C and C++ programming
  • Experience developing in Linux environments
  • Background in multithreaded programming
  • Experience in runtime performance optimization, improving communications latency or throughput
  • Background in developing software for GPUs and collective communication libraries
  • Strong analytical skills and ability to solve complex software challenges

What the JD emphasized

  • HPC communication libraries
  • GPU software development
  • runtime performance optimization
  • collective communication libraries

Other signals

  • optimizing communication libraries for AI workloads
  • performance bottlenecks in AI systems
  • distributed computing for AI