Engineering Manager - Forward Deployed Engineering (llm)

Baseten · Data AI · San Francisco, CA · EPD

Engineering Manager for Forward Deployed Engineering team focused on building, scaling, and optimizing LLM inference workloads for Baseten customers. This role involves hands-on technical ownership, team leadership, and collaboration with product and infrastructure teams to ensure best-in-class performance, reliability, and cost efficiency of AI applications on Baseten's platform. The role contributes to the core codebase and drives feature roadmap, acting as a player-coach.

What you'd actually do

  1. Lead, mentor, and grow a team of Forward Deployed Engineers, providing guidance on technical direction, project execution, and professional development.
  2. Set clear goals and ensure timely, high-quality delivery across multiple customer-facing projects involving LLM deployment and inference optimization.
  3. Collaborate with leadership to align team priorities with company and customer goals, balancing short-term delivery, widely varying customer priorities, and long-term technical initiatives.
  4. Player-coach – While much of this role will be leading the team, you will also be expected to be a key driver on strategic product initiatives and customer engagements. The best managers derive credibility from being able to be hands-on when needed.
  5. Develop and maintain software systems and product features using one or more general-purpose programming languages in a production-level environment, with a preference for Python due to its relevance in ML projects.

Skills

Required

  • Python
  • LLMs
  • inference optimization
  • serving frameworks (e.g., vLLM, TensorRT, Triton, Hugging Face, Ray Serve)
  • observability
  • profiling
  • cost/performance tradeoffs in production ML systems
  • leadership
  • mentorship

Nice to have

  • leading customer-facing engineering teams
  • working directly with enterprise partners
  • GPU infrastructure
  • distributed inference
  • model compression techniques

What the JD emphasized

  • customer-facing projects
  • LLM deployment
  • inference optimization
  • production-level environment
  • building or optimizing ML inference systems
  • LLMs
  • inference optimization
  • serving frameworks
  • production ML systems

Other signals

  • LLM inference
  • customer engagements
  • platform engineering
  • optimization