Senior Machine Learning Infrastructure Engineer

Unity Unity · Enterprise · Mountain View, CA · AI & Machine Learning

Senior Machine Learning Infrastructure Engineer for Unity's Vector Ads team, focusing on building and operating real-time, high-scale, low-latency ML serving infrastructure for a global advertising platform. Responsibilities include designing, building, and maintaining serving pipelines, partnering with ML engineers for productionization, and improving infrastructure efficiency.

What you'd actually do

  1. Design, build, and maintain the infrastructure that serves ML models in real-time across Unity's ads ecosystem
  2. Build and operate scalable model serving pipelines — owning latency, throughput, and reliability in a high-QPS production environment
  3. Partner with ML engineers to productionize models, manage model deployments, and improve iteration speed
  4. Improve observability, performance, and cost-efficiency of ML serving infrastructure
  5. Contribute to architectural decisions around feature serving, model versioning, and inference optimization

Skills

Required

  • Experience building and operating ML infrastructure or model serving systems in production
  • Proficiency in Golang or Python, with strong systems engineering fundamentals
  • Hands-on experience with Kubernetes and container orchestration at scale
  • Familiarity with ML serving frameworks such as Ray Serve, Triton, TorchServe, or similar
  • Understanding of distributed systems, API design, and system reliability
  • Strong collaboration and communication skills

Nice to have

  • Experience with feature stores, feature pipelines, or online/offline feature serving
  • Background in ad tech, real-time bidding, or programmatic advertising systems
  • Familiarity with infrastructure-as-code such as Terraform
  • Experience with observability tooling — Prometheus, Grafana, OpenTelemetry
  • Background with real-time data pipelines, caching layers, or low-latency serving systems

What the JD emphasized

  • production
  • high-scale
  • low-latency
  • real-time
  • scalable
  • productionize
  • inference optimization

Other signals

  • production ML infrastructure
  • high-scale, low-latency
  • serving ML models in real-time
  • Kubernetes at scale
  • ML serving frameworks