Senior Technical Program Manager, Machine Learning Infrastructure

Cohere Cohere · AI Frontier · Canada · Product Management & Program Management

Senior Technical Program Manager for Machine Learning Infrastructure at Cohere, focusing on managing programs related to inference, efficiency, serving, and endpoints to scale infrastructure for a growing user base. The role involves end-to-end coordination, identifying pain points, establishing processes, and collaborating with stakeholders to ensure successful program execution.

What you'd actually do

  1. Manage the program portfolio covering inference, efficiency, serving, and endpoints to guarantee that Cohere’s infrastructure continues to scale for a rapidly expanding user base of internal and external users
  2. Lead the end-to-end coordination and execution across the program, with a particular focus on cross-functional collaboration with Modeling and customer-facing teams
  3. Identify pain points and establish processes to ensure that the team can focus on development, while meeting needs of internal and external users of the models, and improving engineering best practices
  4. Build a strong culture around continuous improvement, such as liaising with incident management leads, ensuring that problems are accurately root caused, and that required fixes are provided in a timely manner
  5. Manage various overlapping projects and programs, ruthlessly prioritizing asks, to ensure that the company’s top priorities are met

Skills

Required

  • Technical Program Management
  • Machine Learning Infrastructure
  • model inference
  • model serving
  • model efficiency
  • endpoints design & implementation
  • ML infrastructure design and implementation
  • engineering best practices
  • cross-functional collaboration
  • incident management
  • stakeholder management
  • project and program management

Nice to have

  • hands-on technical experience
  • working with ML teams
  • understanding of how machine learning models are built

What the JD emphasized

  • 5+ years of Technical Program Management experience focusing on Machine Learning Infrastructure
  • In-depth technical knowledge around ML infrastructure design and implementation
  • Experience working in a chaotic, fast paced, low structure environment

Other signals

  • managing ML infrastructure programs
  • inference, efficiency, serving, and endpoints
  • scaling for a rapidly expanding user base
  • cross-functional collaboration
  • improving engineering best practices