Senior Machine Learning Infrastructure Engineer

Unity Unity · Enterprise · Mountain View, CA · AI & Machine Learning

Senior Machine Learning Infrastructure Engineer at Unity's Vector Ads team, focusing on building and operating real-time, high-scale, low-latency infrastructure for serving ML models in production for Unity's global advertising platform. The role involves designing, building, and maintaining serving pipelines, partnering with ML engineers, and improving infrastructure performance and cost-efficiency.

What you'd actually do

  1. Design, build, and maintain the infrastructure that serves ML models in real-time across Unity's ads ecosystem
  2. Build and operate scalable model serving pipelines — owning latency, throughput, and reliability in a high-QPS production environment
  3. Partner with ML engineers to productionize models, manage model deployments, and improve iteration speed
  4. Improve observability, performance, and cost-efficiency of ML serving infrastructure
  5. Contribute to architectural decisions around feature serving, model versioning, and inference optimization

Skills

Required

  • Experience building and operating ML infrastructure or model serving systems in production
  • Proficiency in Golang or Python, with strong systems engineering fundamentals
  • Hands-on experience with Kubernetes and container orchestration at scale
  • Familiarity with ML serving frameworks such as Ray Serve, Triton, TorchServe, or similar
  • Understanding of distributed systems, API design, and system reliability
  • Strong collaboration and communication skills in a remote-first environment

Nice to have

  • Experience with feature stores, feature pipelines, or online/offline feature serving
  • Background in ad tech, real-time bidding, or programmatic advertising systems
  • Familiarity with infrastructure-as-code such as Terraform
  • Experience with observability tooling — Prometheus, Grafana, OpenTelemetry
  • Background with real-time data pipelines, caching layers, or low-latency serving systems

What the JD emphasized

  • ML infrastructure
  • model serving systems
  • high-scale
  • low-latency
  • real-time

Other signals

  • high-scale
  • low-latency
  • billions of requests daily
  • real-time systems
  • ML models in production