Generative AI Inference Engineer

Stability AI Stability AI · AI Frontier · Remote · Technical

Stability AI is seeking a Generative AI Inference Engineer to join their Inference team. The role focuses on developing and running inference for multi-modal generative AI models, with an emphasis on optimization techniques and deployment. The engineer will work with researchers and engineers, leveraging high-performance computing resources and partnering with cloud providers to deliver hosted inference solutions.

What you'd actually do

  1. Lead efforts to drive the design, development of customer-facing multi modal ML inference systems.
  2. Work with the Platform and Inference teams on building inference systems for the next generation of models, where you will work on areas such as optimization, model tuning and deployment.
  3. Partner with leading cloud providers to deliver hosted Stability AI inference solutions.
  4. Be a strategic thought partner for leaders across the organization on driving business impact through machine learning
  5. Be part of the team to bring new Stability models and pipelines into existence

Skills

Required

  • productionizing machine learning systems
  • inference pipeline development
  • writing and running python services at scale
  • python scientific stack
  • PyTorch
  • high-performance inference framework (e.g. Triton and TensorRT)
  • Diffusion Architecture
  • profiling and optimizing deep neural networks on Nvidia GPUs
  • NVIDIA Nsight
  • python-based image manipulation/encoding/decoding frameworks
  • OpenCV
  • deploying to cloud orchestration systems
  • Kubernetes
  • AWS
  • GCP
  • Azure
  • Docker
  • rapidly prototype solutions
  • tight product deadlines
  • open-source ML ecosystem (HuggingFace, W&B, etc.)

Nice to have

  • ComfyUI
  • workflow tools

What the JD emphasized

  • productionizing machine learning systems
  • inference pipeline development
  • writing and running python services at scale
  • high-performance inference framework
  • Diffusion Architecture
  • profiling and optimizing deep neural networks on Nvidia GPUs
  • cloud orchestration systems

Other signals

  • customer-facing ML inference systems
  • optimization
  • model tuning
  • deployment
  • productionize inference platform improvements