Senior Machine Learning Engineer, Agentic Systems - Moveworks

ServiceNow ServiceNow · Enterprise · Mountain View, CA +1 · Engineering

This role focuses on building and optimizing scalable ML infrastructure for training, evaluation, and deployment of LLMs, including distributed training and inference pipelines, LLM latency optimization, and model evaluation/monitoring frameworks. The goal is to support the company's Agentic AI platform and its production models.

What you'd actually do

  1. Design, build and optimize scalable machine learning infrastructure to support training, evaluation, and deployment of large language models.
  2. Build abstractions to automate various steps in different ML workflows
  3. Collaborate with cross functional teams of engineers, data analytics, machine learning experts, and product to build new features
  4. Leverage your experience to drive best practices in ML and data engineering

Skills

Required

  • 5+ years of industry experience in Machine Learning, Infrastructure or related fields
  • Experience with deep learning framework such as Pytorch or Huggingface or LLM serving frameworks such as vLLM or TensorRT-LLM.
  • Experience with building and scaling end-to-end machine learning systems
  • Experience building scalable micro services and ETL pipelines
  • Expertise in Python and experience with performant language such as C++ or GoLang
  • Bachelor's in Computer Science, Computer Engineering, Mathematics, or equivalent field.

Nice to have

  • A love of research publications in the machine learning and software engineering communities
  • Effective communicator with experience collaborating cross-functionally with other teams

What the JD emphasized

  • critical in building, optimizing and scaling end-to-end machine learning systems
  • critical to the long term scalability of our core AI product
  • absolutely critical

Other signals

  • building and productionizing ML infrastructure that runs state of the art models
  • building, optimizing and scaling end-to-end machine learning systems
  • LLM latency optimization
  • distributed training and inference pipeline for large language models(LLM)