Senior/staff Software Engineer - Machine Learning Platform (inference)

Snowflake Snowflake · Data AI · CA-Menlo Park, United States · Engineering

Senior/Staff Software Engineer on the ML Platform team at Snowflake, focusing on building the next-generation platform for enterprise AI and LLM workloads. The role involves defining roadmaps, designing and executing on ML serving infrastructure, and ensuring operational excellence for customer-facing ML services, with a strong emphasis on LLM inference and serving fine-tuned models.

What you'd actually do

  1. Help define and own the roadmap, working collaboratively and proactively with senior architects, PMs, and team leadership. The initiatives include platforms and tools that enable customers to do state-of-the-art machine learning on Snowflake natively.
  2. Collaboratively build and execute a vision for incorporating new advances in machine learning in ways that best achieve the team’s business objectives.
  3. Ensure operational excellence of the services and meet the commitments to our customers regarding reliability, availability, and performance.
  4. Collaborate across other ML partner teams to continuously improve ML development velocity and capabilities at Snowflake.
  5. Support team members in delivering a high level of technical quality.

Skills

Required

  • 7+ years of industry experience designing, building, and supporting Internet serving infrastructure, machine learning platforms, machine learning services, and frameworks
  • Strong track record of working with machine learning systems and/or platforms
  • Experience in serving LLMs using inference engines like vLLM, TensorRT-LLM, TEI, SGLang, and knowing tradeoffs between them
  • Experience serving fine-tuned LLMs (PEFT, DPO, RL)
  • BS/MS/PhD in Computer Science or related majors, or equivalent experience

Nice to have

  • Experience with several of the following frameworks: SKLearn, XGBoost, PyTorch, Tensorflow, MLflow is a plus
  • Previous experience in building batch and real-time ML serving systems preferred
  • Have built a roadmap and vision around machine learning teams, and led technical decision making with help of architects and PMs and team

What the JD emphasized

  • serving LLMs using inference engines like vLLM, TensorRT-LLM, TEI, SGLang, and knowing tradeoffs between them
  • Experience serving fine-tuned LLMs (PEFT, DPO, RL)

Other signals

  • ML Platform
  • LLM Workloads
  • Inference Engines
  • Serving Fine-tuned LLMs