Principal Software Engineer (ai Inference / Distributed Systems)

AMD AMD · Semiconductors · Santa Clara, CA · Engineering

Principal Software Engineer focused on optimizing AI inference performance and distributed systems at AMD, working with the latest hardware and software technologies. Responsibilities include developing techniques for scale-up/scale-out inference, creating methods and tooling for dynamic resource utilization in inference, and supporting the ROCm ecosystem.

What you'd actually do

  1. Develop techniques for optimizing scale-up and scale-out inference.
  2. Develop methods and tooling to utilize dynamic resources in service of inference
  3. Support proliferation of rocm ecosystem.

Skills

Required

  • Expertise in the K8s ecosystem, especially as it pertains to large scale inference
  • Operational experience with at least one of sglang, or vllm and with kserve, llm-d. Experience running inference as a service can be substituted in-lieu of experience with frameworks such as kserve or llm-d.
  • Expertise with techniques used to optimize inference like distributed kv-cache, disaggregation, request scheduling etc
  • Ability to write high quality code with a keen attention to detail.
  • Experience with modern concurrent programming
  • Effective communicator with keen attention to detail.

Nice to have

  • Preferred languages are go and python.
  • Prior experience roadmapping deeply technical areas is highly valuable.

What the JD emphasized

  • improving the performance of key applications and benchmarks
  • optimizing scale-up and scale-out inference
  • utilize dynamic resources in service of inference
  • large scale inference
  • running inference as a service
  • optimize inference

Other signals

  • optimize scale-up and scale-out inference
  • utilize dynamic resources in service of inference
  • support proliferation of rocm ecosystem