Principal Engineer – Distributed AI Systems Architecture (heterogeneous Compute)

Intel Intel · Semiconductors · California, Santa Clara, United States +2

Seeking a Principal Engineer to architect next-generation distributed AI systems across heterogeneous compute platforms (CPUs, GPUs, accelerators). The role focuses on dynamic execution of large-scale AI computation graphs, managing state, locality, and performance. Responsibilities include defining runtime models, stateful scheduling, graph introspection, integrating specialized accelerators, MoE-aware execution, and adaptive runtime optimization. Requires deep expertise in systems architecture, HPC, distributed systems, and heterogeneous compute environments, with experience in AI/ML systems and inference infrastructure preferred.

What you'd actually do

  1. Define a runtime model for executing AI workloads as distributed computation graphs across heterogeneous resources
  2. Architect systems where state (e.g., KV cache) is a first-class concern in scheduling and execution
  3. Develop mechanisms to analyze AI computation graphs and classify stages by: compute intensity, memory bandwidth requirements, communication cost, latency sensitivity
  4. Architect frameworks that treat specialized accelerators (e.g., dataflow engines) as first-class execution targets
  5. Design runtime strategies for Mixture-of-Experts (MoE) models, including: expert placement, routing locality, load balancing vs data movement trade-offs

Skills

Required

  • defining and implementing software architectures for AI frameworks, protocols, and algorithms
  • systems architecture
  • high-performance computing
  • distributed systems
  • parallel or data-parallel computation models
  • heterogeneous compute environments (CPU, GPU, DSP, or accelerators)
  • design end-to-end systems from abstraction through implementation
  • performance trade-offs across compute, memory, and interconnect

Nice to have

  • AI/ML systems
  • inference infrastructure
  • large-scale model serving
  • stream processing
  • dataflow models
  • graph execution systems
  • modern AI frameworks or runtimes
  • developer-facing SDKs or programming models
  • performance optimization and benchmarking

What the JD emphasized

  • hardest problems in modern computing
  • dynamically execute and optimize large-scale AI computation graphs across diverse hardware
  • heterogeneous compute platforms
  • distributed AI systems
  • AI infrastructure
  • distributed Inferencing solution
  • MoE-Aware Execution
  • adaptive execution

Other signals

  • distributed AI systems
  • heterogeneous compute
  • AI infrastructure
  • inference
  • MoE models