What you'd actually do

Design and build scalable training and inference systems for LLMs, Multimodal LLMs, and other media ML models.

Optimize end-to-end training: data pipelines (streaming, sharding, bucketing), distributed training (parallelism strategies), and mixed precision.

Optimize inference and serving: KV cache, batching, quantization, and long-context handling.

Scale model training and inference into robust, performant systems integrated into Netflix workflows.

Act as a technical thought leader for training and inference efficiency, driving initiatives that significantly improve scalability, latency, and reliability.

Skills

Required

ML engineering for large, production-grade systems
LLMs
Multimodal LLMs
media ML models
training optimization
high-throughput data loading
distributed training
GPU/accelerator optimization
inference optimization
KV cache design and optimization
batching and scheduling for high-throughput, low-latency serving
quantization
model compression
PyTorch
software engineering fundamentals
testing
observability
performance profiling
leading ML initiatives
stakeholder partnership
communication and collaboration skills
ambiguity tolerance
high ownership

Nice to have

technical thought leadership
mentoring engineers and scientists

What the JD emphasized

Extensive experience in ML engineering for large, production-grade systems using LLMs, Multimodal LLMs, and other media ML models.

Deep hands-on expertise in training optimization: high-throughput data loading (streaming, sharding, bucketing); distributed training (parallelism strategies); GPU/accelerator optimization.

Strong experience in inference optimization: KV cache design and optimization; batching and scheduling for high-throughput, low-latency serving; quantization and/or model compression.

At Netflix, our mission is to entertain the world. Together, we are writing the next episode - pushing the boundaries of storytelling, global fandom and making the unimaginable a reality. We are a dream team obsessed with the uncomfortable excitement of discovering what happens when you merge creativity, intuition and cutting-edge technology. Come be a part of what’s next.

The Globalization Data Science and Engineering team is at the forefront of removing language barriers and providing a stellar member experience to all our members regardless of their language preferences. We are responsible for the translation and cultural adaptation of all aspects of member interaction, including beautiful localized user interfaces, subtitles, and dubbing of award-winning Netflix originals.

We are looking for an experienced Machine Learning Engineer with deep expertise in training and inference efficiency for Large Language Models (LLMs), Multimodal LLMs, and other media ML models. In this rare opportunity, you will design and build systems and infrastructure that make LLM training and inference faster, more scalable, and more reliable across a diverse global catalog and workload. You will partner with a talented cross-functional team of scientists, engineers, product managers, and domain experts to deliver business impact through efficient, production-ready ML solutions.

Responsibilities

Design and build scalable training and inference systems for LLMs, Multimodal LLMs, and other media ML models.
Optimize end-to-end training: data pipelines (streaming, sharding, bucketing), distributed training (parallelism strategies), and mixed precision.
Optimize inference and serving: KV cache, batching, quantization, and long-context handling.
Scale model training and inference into robust, performant systems integrated into Netflix workflows.
Act as a technical thought leader for training and inference efficiency, driving initiatives that significantly improve scalability, latency, and reliability.
Mentor and uplevel other engineers and scientists in large-scale ML systems and performance engineering.

About you

Extensive experience in ML engineering for large, production-grade systems using LLMs, Multimodal LLMs, and other media ML models.
Deep hands-on expertise in training optimization: high-throughput data loading (streaming, sharding, bucketing); distributed training (parallelism strategies); GPU/accelerator optimization.
Strong experience in inference optimization: KV cache design and optimization; batching and scheduling for high-throughput, low-latency serving; quantization and/or model compression.
Proficient with PyTorch and solid software engineering fundamentals (testing, observability, performance profiling).
Proven track record of leading ML initiatives and partnering with stakeholders to define and execute impactful roadmaps.
Exceptional communication and collaboration skills; comfortable with ambiguity and high ownership.
Netflix culture resonates with you.

Generally, our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $466,000.00 - $750,000.00. This compensation range will vary based on location.

Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more details about our Benefits here.

Netflix is a unique culture and environment. Learn more here.

Inclusion is a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.

We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.

Job is open for no less than 7 days and will be removed when the position is filled.

Responsibilities

Design and build scalable training and inference systems for LLMs, Multimodal LLMs, and other media ML models.

Optimize end-to-end training: data pipelines (streaming, sharding, bucketing), distributed training (parallelism strategies), and mixed precision.

Optimize inference and serving: KV cache, batching, quantization, and long-context handling.

Scale model training and inference into robust, performant systems integrated into Netflix workflows.

Act as a technical thought leader for training and inference efficiency, driving initiatives that significantly improve scalability, latency, and reliability.

Mentor and uplevel other engineers and scientists in large-scale ML systems and performance engineering.

About you

Extensive experience in ML engineering for large, production-grade systems using LLMs, Multimodal LLMs, and other media ML models.

Deep hands-on expertise in training optimization: high-throughput data loading (streaming, sharding, bucketing); distributed training (parallelism strategies); GPU/accelerator optimization.

Strong experience in inference optimization: KV cache design and optimization; batching and scheduling for high-throughput, low-latency serving; quantization and/or model compression.

Proficient with PyTorch and solid software engineering fundamentals (testing, observability, performance profiling).

Proven track record of leading ML initiatives and partnering with stakeholders to define and execute impactful roadmaps.

Exceptional communication and collaboration skills; comfortable with ambiguity and high ownership.

Netflix culture resonates with you.

Netflix is a unique culture and environment. Learn more here.

Job is open for no less than 7 days and will be removed when the position is filled.

Machine Learning Engineer 5 - Globalization

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Responsibilities

About you

Responsibilities

About you