Senior Cloud Software Development Engineer

Intel Intel · Semiconductors · Oregon, Hillsboro, United States +1

Senior Cloud Software Development Engineer to develop cutting-edge software features and optimizations for Intel's communication libraries including Intel SHMEM, Intel MPI, MPICH, and Intel oneCCL. Focus on oneCCL development, with opportunities to contribute to other libraries. Collaborate with scientists and engineers on the Aurora supercomputer at Argonne National Labs, and contribute to scientific computing and machine learning capabilities.

What you'd actually do

  1. Design, develop, and maintain advanced features and performance optimizations for oneCCL, with potential to contribute to Intel SHMEM, Intel MPI and MPICH libraries.
  2. Optimize software to achieve performance requirements including low latency, high bandwidth, and high reliability
  3. Implement and enhance communication protocols across multiple layers of the communications stack
  4. Collaborate with cross-functional teams to define software requirements and technical specifications
  5. Work directly with scientists and engineers on high-performance computing applications and supercomputer implementations

Skills

Required

  • Master's degree in Computer Science, Computer Engineering or in a STEM related field of Study
  • 3+ years of software development experience
  • 3+ years of Linux environment development experience
  • 3+ years of C and C++ programming experience
  • Experience with multithreaded programming and parallel computing concepts
  • Distributed computing systems and architectures
  • HPC (High-Performance Computing) communications libraries
  • Collective communications libraries (MPI, oneCCL/NCCL, or SHMEM)
  • GPU software development and optimization
  • Network communications stack development (one or more layers)

Nice to have

  • Ph.D. degree in Computer Science, Computer Engineering or in a STEM related field of Study
  • Experience developing performance optimizations that measurably improve communications latency or throughput
  • Experience debugging complex problems across different layers of hardware and software stack
  • Deep understanding of high-performance computing architectures and optimization techniques
  • Experience with Intel GPU and CPU architectures and their optimization characteristics
  • Knowledge of supercomputing environments and large-scale distributed systems
  • Familiarity with scientific computing and machine learning communication patterns

What the JD emphasized

  • performance requirements
  • low latency
  • high bandwidth
  • high reliability
  • performance optimizations
  • communication latency
  • throughput
  • complex problems
  • hardware and software stack
  • HPC (High-Performance Computing) communications libraries
  • Collective communications libraries (MPI, oneCCL/NCCL, or SHMEM)