The Opportunity

Unity Vector builds an Data platform that powers insight, experimentation, attribution, and AI-driven decision-making across the company.

Our systems operate at scale across batch and streaming data, supporting analytics, product intelligence, machine learning pipelines, and business operations. As data volume and complexity grow, our platform also supports large-scale model training, feature generation, and experimentation workflows that power production ML systems.

To support this growth, we need strong technical ownership to ensure our ML pipelines remain reliable, scalable, and architecturally sound.

The Role

We are seeking a senior data infra engineer to design and evolve the large-scale offline platform. This role focuses on building reliable infrastructure for generating data infrastructure, training datasets, and orchestrating data workflows. You will work closely with ML engineers and platform teams to ensure our pipelines can efficiently handle growing data volumes and increasingly complex training workloads.

You will play a key role in shaping how model datasets are prepared to ensure the reliability, scalability, and performance of our data platform.

What You’ll Do

Develop infrastructure that supports both batch and stream big data processing using technologies such as Flink, Spark, Ray, etc.
Design and operate large-scale data pipelines that generate training datasets used for machine learning training and experimentation
Integrate data pipelines with workflow orchestration systems (e.g., Flyte, Airflow, or similar) to enable reliable multi-stage training workflows
Improve reproducibility and observability of data pipelines through dataset validation, monitoring, and automated testing
Optimize performance and resource utilization across distributed compute systems used for data processing
Partner closely with ML engineers to enable efficient large-scale experimentation and model iteration
Lead architectural improvements to ensure our offline data pipelines remain scalable, reliable, and cost-efficient

What We’re Looking For

Experience working with distributed computing frameworks such as Flink, Spark, Ray for distributed data processing
Experience building infrastructure for training data generation, dataset preparation, or ML feature pipelines
Experience optimizing big data pipelines and infrastructure for cost efficiency
Strong programming skills in Python and experience working with large-scale distributed workloads
Experience with modern data infrastructure (data lakes, warehouses, orchestration systems, streaming platforms)
Strong systems thinking, with the ability to reason about performance, scalability, reliability, and cost tradeoffs in distributed systems
Proven ability to lead technical direction and influence architectural decisions across teams without formal authority

Benefits At Unity, we want our team members to thrive. We offer a wide range of benefits designed to support well-being and work-life balance.

Please note: Benefits eligibility, specific offerings, and coverage vary based on the country and employment status.

While specific benefits vary, here are some of the ways we strive to take care of our eligible team members globally: Comprehensive health, life, and disability insurance | Commute subsidy | Employee stock ownership | Competitive retirement/pension plans | Generous vacation and personal days | Support for new parents through leave and family-care programs | Office food snacks | Mental Health and Wellbeing programs and support | Employee Resource Groups | Global Employee Assistance Program | Training and development programs | Volunteering and donation matching program

Life at Unity Unity [NYSE: U] is the world’s leading game engine, powering play for more than 3 billion consumers each month. The top mobile games in the world, the most played PC indie titles, the most innovative console games, and virtually all of the top XR and Web Games are developed, deployed, and grown in Unity. Unity also enables teams across industries like automotive, manufacturing, and healthcare to design, simulate, and collaborate in 3D — closing the gap between ideas and reality. For more information, please visit www.unity.com.

Unity is an equal opportunity employer committed to fostering an inclusive, innovative environment with the best employees. Therefore, we provide employment opportunities without regard to age, race, color, ancestry, national origin, disability, gender, or any other protected status in accordance with applicable law. If you have a disability that means there are preparations or accommodations we can make to help ensure you have a comfortable and positive interview experience, please fill out this form to let us know.

This position requires the incumbent to have a sufficient knowledge of English to have professional verbal and written exchanges in this language since the performance of the duties related to this position requires frequent and regular communication with colleagues and partners located worldwide and whose common language is English.

Headhunters and recruitment agencies may not submit resumes/CVs through this website or directly to managers. Unity does not accept unsolicited headhunter and agency resumes. Unity will not pay fees to any third-party agency or company that does not have a signed agreement with Unity.

Your privacy is important to us. Please take a moment to review our Prospect and Applicant Privacy Policies. Should you have any concerns about your privacy, please contact us at DPO@unity.com.

#LI-CW3

The Opportunity

Unity Vector builds an Data platform that powers insight, experimentation, attribution, and AI-driven decision-making across the company.

To support this growth, we need strong technical ownership to ensure our ML pipelines remain reliable, scalable, and architecturally sound.

The Role

You will play a key role in shaping how model datasets are prepared to ensure the reliability, scalability, and performance of our data platform.

What You’ll Do

Develop infrastructure that supports both batch and stream big data processing using technologies such as Flink, Spark, Ray, etc.
Design and operate large-scale data pipelines that generate training datasets used for machine learning training and experimentation
Integrate data pipelines with workflow orchestration systems (e.g., Flyte, Airflow, or similar) to enable reliable multi-stage training workflows
Improve reproducibility and observability of data pipelines through dataset validation, monitoring, and automated testing
Optimize performance and resource utilization across distributed compute systems used for data processing
Partner closely with ML engineers to enable efficient large-scale experimentation and model iteration
Lead architectural improvements to ensure our offline data pipelines remain scalable, reliable, and cost-efficient

What We’re Looking For

Experience working with distributed computing frameworks such as Flink, Spark, Ray for distributed data processing
Experience building infrastructure for training data generation, dataset preparation, or ML feature pipelines
Experience optimizing big data pipelines and infrastructure for cost efficiency
Strong programming skills in Python and experience working with large-scale distributed workloads
Experience with modern data infrastructure (data lakes, warehouses, orchestration systems, streaming platforms)
Strong systems thinking, with the ability to reason about performance, scalability, reliability, and cost tradeoffs in distributed systems
Proven ability to lead technical direction and influence architectural decisions across teams without formal authority

Benefits At Unity, we want our team members to thrive. We offer a wide range of benefits designed to support well-being and work-life balance.

Please note: Benefits eligibility, specific offerings, and coverage vary based on the country and employment status.

Your privacy is important to us. Please take a moment to review our Prospect and Applicant Privacy Policies. Should you have any concerns about your privacy, please contact us at DPO@unity.com.

#LI-CW3

Senior/staff Machine Learning Engineer, Data Infrastructure

What you'd actually do

Skills

Required

What the JD emphasized

Other signals

The Opportunity

The Role

What You’ll Do

What We’re Looking For

The Opportunity

The Role

What You’ll Do

What We’re Looking For