Research Engineer/scientist (all Levels), Efficient Models

ByteDance · Big Tech · San Jose, CA · R&D

Research Engineer/Scientist focused on developing efficient algorithms and architectures for large-scale generative and multimodal models, with an emphasis on model distillation, compression, and hardware-efficient inference for applications like image generation, video generation, and VLMs.

What you'd actually do

Develop efficient algorithms and architectures for large-scale generative and multimodal models, using techniques such as step distillation, cfg distillation, quantization, and other methods to improve model efficiency (e.g., image generation, video generation, VLM).
Advance scalable generative modeling approaches, including diffusion and autoregressive models, with a focus on acceleration and efficiency.

Skills

Required

B.S. in Computer Science or related fields, or equivalent experience
Expertise in efficient models with deep understanding of computational bottlenecks and acceleration methods.
Proficiency in training generative AI or LLM models using widely adopted frameworks and tools such as PyTorch and JAX.
Strong communication and collaboration skills in fast-paced environments.

Nice to have

Ph.D. in GenAI, MLSys or equivalent experience
Extensive research experiences in broad GenAI, MLSys, LLM areas.
Proven experiences in at least one of the following areas: image/video generation and editing; model compression (e.g., quantization, step/cfg distillation); efficient architectures (e.g., MoE, window attention); efficient model design; or reinforcement learning training methods (e.g., RLHF, DPO, GRPO).

What the JD emphasized

distillation
compression
model efficiency
scalable training
optimization
deployment
model acceleration
hardware-efficient inference
image generation
video generation
VLM
diffusion
autoregressive models
quantization
step distillation
cfg distillation
RLHF
DPO
GRPO

Other signals

Developing methods and infrastructure for transferring capabilities from foundation models into smaller, more efficient models
scalable training, optimization, and deployment
distillation frameworks, model acceleration, hardware-efficient inference

Apply on company site

● Active

Posted 5mo ago · last seen 1w ago · 148 days open

AI score: 9/10
Stage: Post-train Serve
Location: San Jose, CA
Role: Researcher
Function: Research
Domain: consumer
Team: Vision-Applied Research
Maturity: Scaling

Skills

Computer Vision & Multimodal

Computer VisionImage GenerationVideo Generation

Frameworks & Tools

JAXPyTorch

Infrastructure & Systems

Computer ArchitectureHigh-Performance ComputingQuantization

LLM & Foundation Models

Generative AILLM PretrainingLarge Language Models (LLMs)Model Compression

Leadership & Management

Market & Technology Trend Analysis

ML Ops & Evaluation

Inference OptimizationKnowledge Distillation

ML Techniques

Attention MechanismsDiffusion ModelsDirect Preference Optimization (DPO)Group Relative Policy Optimization (GRPO)Mixture of Experts (MoE)Multimodal LearningOptimization MethodsPredictive ModelingRLHF & Alignment TechniquesReinforcement Learning (RL)

NLP & Language

Conversational AI

Research & Credentials

PhD Required

Read full job description

About the Team The Vision-Applied Research team focuses on applied research in Generative AI and CV/Multimodal Understanding, and delivering intelligent solutions to ByteDance products, enabling users to make and share creative content in a much easier way. The team has research groups dedicated to generative models for content creation, image generation, video synthesis, intelligent image/video editing, and virtual humans.

The team is looking for a Research Engineer / Scientist who can take initiatives in designing and implementing efficient models for large-scale generative AI, with a particular emphasis on large model distillation and compression. The candidate will work on developing methods and infrastructure for transferring capabilities from foundation models into smaller, more efficient models, enabling scalable training, optimization, and deployment. Responsibilities may include, but are not limited to, distillation frameworks, model acceleration, hardware-efficient inference, and their applications.

Responsibilities

Develop efficient algorithms and architectures for large-scale generative and multimodal models, using techniques such as step distillation, cfg distillation, quantization, and other methods to improve model efficiency (e.g., image generation, video generation, VLM).
Advance scalable generative modeling approaches, including diffusion and autoregressive models, with a focus on acceleration and efficiency.

Requirements

Minimum Qualifications:

B.S. in Computer Science or related fields, or equivalent experience
Expertise in efficient models with deep understanding of computational bottlenecks and acceleration methods.
Proficiency in training generative AI or LLM models using widely adopted frameworks and tools such as PyTorch and JAX.
Strong communication and collaboration skills in fast-paced environments.

Preferred Qualifications

Ph.D. in GenAI, MLSys or equivalent experience
Extensive research experiences in broad GenAI, MLSys, LLM areas.
Proven experiences in at least one of the following areas: image/video generation and editing; model compression (e.g., quantization, step/cfg distillation); efficient architectures (e.g., MoE, window attention); efficient model design; or reinforcement learning training methods (e.g., RLHF, DPO, GRPO).

Responsibilities

Develop efficient algorithms and architectures for large-scale generative and multimodal models, using techniques such as step distillation, cfg distillation, quantization, and other methods to improve model efficiency (e.g., image generation, video generation, VLM).
Advance scalable generative modeling approaches, including diffusion and autoregressive models, with a focus on acceleration and efficiency.

Requirements

Minimum Qualifications:

B.S. in Computer Science or related fields, or equivalent experience
Expertise in efficient models with deep understanding of computational bottlenecks and acceleration methods.
Proficiency in training generative AI or LLM models using widely adopted frameworks and tools such as PyTorch and JAX.
Strong communication and collaboration skills in fast-paced environments.

Preferred Qualifications

Ph.D. in GenAI, MLSys or equivalent experience
Extensive research experiences in broad GenAI, MLSys, LLM areas.
Proven experiences in at least one of the following areas: image/video generation and editing; model compression (e.g., quantization, step/cfg distillation); efficient architectures (e.g., MoE, window attention); efficient model design; or reinforcement learning training methods (e.g., RLHF, DPO, GRPO).