Forward Deployed Engineer - Ai/ml Platforms

Anyscale Anyscale · Data AI · San Francisco, CA · Customer Solutions Group

This role focuses on deploying and operating AI/ML platforms and infrastructure for enterprise customers, leveraging Anyscale's Ray platform. The engineer will partner with customer teams to build scalable AI platforms, modernize ML infrastructure, and operationalize distributed AI applications. Key responsibilities include designing and implementing production-grade architectures on Kubernetes and public clouds, troubleshooting distributed systems, and developing automation. The role requires strong cloud infrastructure, Kubernetes, and software engineering skills, with experience in customer-facing roles.

What you'd actually do

  1. Design and implement production-grade AI platform architectures on Kubernetes and public cloud infrastructure (AWS, Azure, and GCP).
  2. Partner directly with customer platform, infrastructure, and ML engineering teams to deploy, operate, and optimize distributed AI workloads.
  3. Lead implementation engagements that include platform installation, networking, security, observability, scaling, upgrades, and operational readiness.
  4. Troubleshoot complex distributed systems issues spanning infrastructure, Kubernetes, networking, storage, and AI applications.
  5. Develop automation, tooling, reference implementations, and infrastructure-as-code that accelerate customer success and improve repeatability.

Skills

Required

  • 5+ years of experience in cloud infrastructure, platform engineering, DevOps, Site Reliability Engineering, or software engineering.
  • Experience building, deploying, or operating ML/AI platforms that support model training, inference, or large-scale data processing workloads.
  • Strong expertise with Kubernetes and containerized production environments.
  • Experience operating cloud infrastructure on AWS, Azure, or GCP, including networking, security, IAM, storage, and infrastructure automation.
  • Experience with Infrastructure as Code and modern DevOps tooling such as Terraform, Helm, GitOps, CI/CD pipelines, or similar technologies.
  • Strong software engineering skills in Python, Go, Java, or a comparable language, with experience building automation or production services.
  • Experience working directly with enterprise customers in consulting, professional services, field engineering, solutions architecture, or another customer-facing engineering role.
  • Excellent communication skills and the ability to work effectively with both executive and deeply technical stakeholders.

Nice to have

  • Familiarity with distributed computing frameworks such as Ray, Spark, Dask, or Kubernetes-native distributed systems is a strong plus.
  • A passion for solving difficult customer problems and building reusable technical solutions.
  • Willingness to travel as needed to work alongside strategic customers.

What the JD emphasized

  • production AI workloads
  • distributed AI applications
  • production-grade AI platform architectures
  • Kubernetes
  • cloud infrastructure
  • distributed systems issues
  • customer teams

Other signals

  • customer-facing
  • infrastructure
  • production AI workloads
  • Kubernetes
  • cloud infrastructure