Engineering Manager, Machine Learning Operations

PitchBook PitchBook · Fintech · Seattle, WA · Technology Operations

Engineering Manager for an MLOps team responsible for enabling ML teams by optimizing the ML Development Life Cycle. The role supports projects in GenAI, LLMs, NLP, Classification, and Regression, and is critical for driving AI innovation.

What you'd actually do

  1. Lead the MLOps team direction and execution (operations, processes, practices, and standards), working closely with engineering leadership and product management to craft roadmaps, define KPIs, and achieve success criteria
  2. Ensure effective communication and coordination across geographically dispersed teams. Oversee the enablement of scalable solutions that meet high standards of reliability and efficiency
  3. Champion the adoption and integration of ML best practices at PitchBook, fostering a culture of innovation and experimentation to drive the development of high-quality AI products
  4. Serve as a force multiplier by removing roadblocks, implementing process improvements, providing frequent and actionable feedback to team members, and building practices for ideation and innovation
  5. Bridge the gap between business/product needs and execution, including building and delivering on group-level objectives and key results, identifying resource needs, and building execution plans for initiatives

Skills

Required

  • 3+ years of experience in an engineering leadership role
  • 6+ years of experience in hands-on development of Machine Learning algorithms
  • 6+ years of experience in hands-on deployment of Machine Learning services
  • 6+ years of experience supporting the entire MLDLC, including post-deployment operations such as monitoring and maintenance
  • 6+ years of experience with Amazon Web Services (AWS) and/or Google Cloud Platform (GCP)
  • Experience with at least 70%: PyTorch, Tensorflow, LangChain, scikit-learn, Redis, Elasticsearch, Amazon SageMaker, Google Vertex AI, Weights & Biases, FastAPI, Prometheus, Grafana, Apache Kafka, Apache Airflow, MLflow, and KubeFlow
  • Ability to break large, complex problems into well-defined steps, ensuring iterative development and continuous improvement
  • Experience in cloud-native delivery with a deep practical understanding of containerization technologies such as Kubernetes and Docker, and the ability to manage these across different regions
  • Proficiency in GitOps and creation/management of CI/CD pipelines
  • Demonstrated experience building and using SQL/NoSQL databases
  • Demonstrated experience with Python (Java is a plus) and other relevant programming languages and tools
  • Excellent problem-solving skills with a focus on innovation, efficiency, and scalability in a global context
  • Strong communication and collaboration skills, with the ability to engage effectively with internal customers across various cultures and regions

Nice to have

  • Java

What the JD emphasized

  • managing globally distributed teams
  • hands-on development of Machine Learning algorithms
  • hands-on deployment of Machine Learning services
  • supporting the entire MLDLC
  • Experience with at least 70%: PyTorch, Tensorflow, LangChain, scikit-learn, Redis, Elasticsearch, Amazon SageMaker, Google Vertex AI, Weights & Biases, FastAPI, Prometheus, Grafana, Apache Kafka, Apache Airflow, MLflow, and KubeFlow

Other signals

  • MLOps team
  • ML Development Life Cycle (MLDLC)
  • Generative AI (GenAI)
  • Large Language Models (LLMs)
  • Natural Language Processing (NLP)