Senior Machine Learning Engineer, ML Infrastructure - Online

Unity · Enterprise · Shanghai, China · AI & Machine Learning

Senior/Staff ML Engineer to design and evolve Unity Vector’s online model inference platform. Focuses on building reliable infrastructure for serving ML models in production, optimizing inference performance, and enabling safe, efficient experimentation across high-traffic online systems. Requires strong systems thinking, deep experience with production ML infrastructure, and ability to drive architectural improvements.

What you'd actually do

Design and operate large-scale online inference infrastructure that serves production ML models with low latency and high reliability.
Build and improve model serving systems using technologies such as PyTorch, Triton Inference Server, Kubernetes, GKE, Ray, or similar distributed serving frameworks.
Optimize inference performance through batching, model compilation, GPU/CPU utilization improvements, request scheduling, and runtime-level tuning.
Develop infrastructure for model deployment, canary testing, A/B experimentation, traffic splitting, rollback, and production validation.
Improve observability of online ML systems through latency, throughput, error-rate, cost, saturation, and model-health monitoring.

Skills

Required

Python
PyTorch
NVIDIA Triton Inference Server
Kubernetes
GKE
Ray
distributed systems
autoscaling
service reliability
production observability
model serving frameworks
inference optimization
model deployment
canary testing
A/B experimentation
rollback
production validation
systems thinking

Nice to have

TorchServe
TensorFlow Serving
model compilation
quantization
GPU acceleration
GPU kernel optimization
caching
runtime tuning

What the JD emphasized

strong technical ownership
design and evolve Unity Vector’s online model inference platform
building reliable infrastructure for serving machine learning models in production
optimizing inference performance
enabling safe, efficient experimentation across high-traffic online systems
ensure models can be deployed, scaled, monitored, and iterated on efficiently
shaping how models are packaged, served, validated, monitored, and optimized in production environments
strong systems thinking
deep experience with production ML infrastructure
ability to drive architectural improvements across teams
Strong experience building and operating production-grade online ML inference systems.
Experience with model serving frameworks such as NVIDIA Triton Inference Server, TorchServe, Ray Serve, TensorFlow Serving, or similar systems.
Experience optimizing inference workloads using techniques such as dynamic batching, model compilation, quantization, GPU acceleration, GPU kernel optimization, caching, or runtime tuning.
Strong experience with distributed systems, Kubernetes, autoscaling, service reliability, and production observability.
Strong programming skills in Python, with practical experience working on production ML systems and high-scale services.
Experience with PyTorch and modern model deployment workflows, including model packaging, validation, and serving lifecycle management.
Experience designing infrastructure for safe model rollout, canary testing, A/B experimentation, and automated rollback.
Strong systems thinking, with the ability to reason about latency, throughput, reliability, scalability, and cost tradeoffs in online systems.
Proven ability to lead technical direction and influence architectural decisions across teams without formal authority.

Other signals

online ML systems
production models at scale
low-latency inference
large-scale experimentation
model deployment and optimization
feature processing
business-critical decisioning
inference platform
reliable, scalable, observable, and cost-efficient
online model inference platform
serving machine learning models in production
optimizing inference performance
safe, efficient experimentation
high-traffic online systems
deploy, scale, monitor, and iterate efficiently
package, served, validated, monitored, and optimized
systems thinking
production ML infrastructure
architectural improvements

Read full job description

The opportunity Unity Vector builds ML infrastructure that powers real-time prediction, experimentation, attribution, and AI-driven decision-making across the company.

Our online ML systems serve production models at scale, supporting low-latency inference, large-scale experimentation, model deployment and optimization, feature processing, and business-critical decisioning. As model complexity, traffic volume, and experimentation velocity continue to grow, our inference platform must remain reliable, scalable, observable, and cost-efficient.

To support this growth, we need strong technical ownership to evolve the online ML infrastructure that enables ML teams to safely deploy, validate, and operate production models at scale.

The Role

We are seeking a senior/staff ML engineer to design and evolve Unity Vector’s online model inference platform. This role focuses on building reliable infrastructure for serving machine learning models in production, optimizing inference performance, and enabling safe, efficient experimentation across high-traffic online systems.

You will work closely with ML engineers, platform teams, and product stakeholders to ensure models can be deployed, scaled, monitored, and iterated on efficiently. You will play a key role in shaping how models are packaged, served, validated, monitored, and optimized in production environments.

This role requires strong systems thinking, deep experience with production ML infrastructure, and the ability to drive architectural improvements across teams.

What you'll be doing

Design and operate large-scale online inference infrastructure that serves production ML models with low latency and high reliability.
Build and improve model serving systems using technologies such as PyTorch, Triton Inference Server, Kubernetes, GKE, Ray, or similar distributed serving frameworks.
Optimize inference performance through batching, model compilation, GPU/CPU utilization improvements, request scheduling, and runtime-level tuning.
Develop infrastructure for model deployment, canary testing, A/B experimentation, traffic splitting, rollback, and production validation.
Improve observability of online ML systems through latency, throughput, error-rate, cost, saturation, and model-health monitoring.
Build self-healing and autoscaling capabilities to support dynamic experiment traffic, changing model complexity, and production reliability requirements.
Partner closely with ML engineers to support faster model iteration while maintaining production safety, scalability, and cost efficiency.
Improve the reliability and reproducibility of model serving workflows, including model packaging, artifact validation, compatibility testing, and deployment automation.
Lead architectural improvements that make the online ML platform more robust, user-friendly, scalable, and cost-efficient.

What we're looking for

Strong experience building and operating production-grade online ML inference systems.
Experience with model serving frameworks such as NVIDIA Triton Inference Server, TorchServe, Ray Serve, TensorFlow Serving, or similar systems.
Experience optimizing inference workloads using techniques such as dynamic batching, model compilation, quantization, GPU acceleration, GPU kernel optimization, caching, or runtime tuning.
Strong experience with distributed systems, Kubernetes, autoscaling, service reliability, and production observability.
Strong programming skills in Python, with practical experience working on production ML systems and high-scale services.
Experience with PyTorch and modern model deployment workflows, including model packaging, validation, and serving lifecycle management.
Experience designing infrastructure for safe model rollout, canary testing, A/B experimentation, and automated rollback.
Strong systems thinking, with the ability to reason about latency, throughput, reliability, scalability, and cost tradeoffs in online systems.
Proven ability to lead technical direction and influence architectural decisions across teams without formal authority.

Additional information

Relocation support is not available for this position
Work visa/immigration sponsorship is not available for this position

Benefits At Unity, we want our team members to thrive. We offer a wide range of benefits designed to support well-being and work-life balance.

Please note: Benefits eligibility, specific offerings, and coverage vary based on the country and employment status.

While specific benefits vary, here are some of the ways we strive to take care of our eligible team members globally: Comprehensive health, life, and disability insurance | Commute subsidy | Employee stock ownership | Competitive retirement/pension plans | Generous vacation and personal days | Support for new parents through leave and family-care programs | Office food snacks | Mental Health and Wellbeing programs and support | Employee Resource Groups | Global Employee Assistance Program | Training and development programs | Volunteering and donation matching program

Life at Unity Unity [NYSE: U] is the world’s leading game engine, powering play for more than 3 billion consumers each month. The top mobile games in the world, the most played PC indie titles, the most innovative console games, and virtually all of the top XR and Web Games are developed, deployed, and grown in Unity. Unity also enables teams across industries like automotive, manufacturing, and healthcare to design, simulate, and collaborate in 3D — closing the gap between ideas and reality. For more information, please visit www.unity.com.

Unity is an equal opportunity employer committed to fostering an inclusive, innovative environment with the best employees. Therefore, we provide employment opportunities without regard to age, race, color, ancestry, national origin, disability, gender, or any other protected status in accordance with applicable law. If you have a disability that means there are preparations or accommodations we can make to help ensure you have a comfortable and positive interview experience, please fill out this form to let us know.

This position requires the incumbent to have a sufficient knowledge of English to have professional verbal and written exchanges in this language since the performance of the duties related to this position requires frequent and regular communication with colleagues and partners located worldwide and whose common language is English.

Headhunters and recruitment agencies may not submit resumes/CVs through this website or directly to managers. Unity does not accept unsolicited headhunter and agency resumes. Unity will not pay fees to any third-party agency or company that does not have a signed agreement with Unity.

Your privacy is important to us. Please take a moment to review our Prospect and Applicant Privacy Policies. Should you have any concerns about your privacy, please contact us at DPO@unity.com.

#SEN