What you'd actually do

Implement, optimize, and profile highly performant kernels for inference and training on Tesla's AI and Dojo ASICs

Optimize bottlenecks in the inference flow, make precision/performance tradeoff decisions, and develop novel techniques to improve hardware utilization and throughput

Work on a variety of edge and datacenter workloads, from small encoders/decoders to distributed LLM inference

Work with hardware teams to shape the next generation of Tesla hardware, evaluating architectural tradeoffs and balancing performance with versatility

Research and implement state-of-the-art machine learning techniques to achieve high performance on our hardware

Skills

Required

kernel optimization
distributed systems
inference runtimes
serving frameworks
LLMs
transformers
state space models
diffusion models
CNNs
performance modeling
roofline analysis
computer and GPU architecture
SIMD
multithreading
accelerators with vectorized instructions
analytical and debugging skills
ability to work across team boundaries

Nice to have

MLIR-based compiler stack
Contributions to ML serving frameworks
compilers
frameworks (e.g. SGLang, LLVM, PyTorch, MLIR)

The Performance Optimization team takes research models and makes them run efficiently on Tesla's AI-ASICs, powering Autopilot and Optimus. This role sits at the intersection of AI research, compiler development, kernel optimization, and hardware design. You'll collaborate with AI researchers on novel quantization techniques, precision improvements, and non-standard architectures. You'll develop optimized kernels using our MLIR-based compiler stack to achieve real-time latency for self-driving and humanoid robots. You'll work directly with hardware teams to shape our next-generation AI5, AI6, and Dojo3 chips.

Responsibilities

Implement, optimize, and profile highly performant kernels for inference and training on Tesla's AI and Dojo ASICs
Optimize bottlenecks in the inference flow, make precision/performance tradeoff decisions, and develop novel techniques to improve hardware utilization and throughput
Work on a variety of edge and datacenter workloads, from small encoders/decoders to distributed LLM inference
Work with hardware teams to shape the next generation of Tesla hardware, evaluating architectural tradeoffs and balancing performance with versatility
Research and implement state-of-the-art machine learning techniques to achieve high performance on our hardware
Experiment with numerical methods and alternative architectures
Collaborate with the compiler team on programmability and performance

Requirements

Hands-on experience with kernel optimization, distributed systems, inference runtimes, or serving frameworks
Experience with LLMs, transformers, state space models, diffusion models, CNNs, and their performance characteristics
Familiarity with performance modeling and roofline analysis
Understanding of computer and GPU architecture, SIMD, multithreading, and/or other accelerators with vectorized instructions
Strong analytical and debugging skills
Ability to work across team boundaries with compiler, hardware, and ML teams
Degree in Engineering, Computer Science, or equivalent in experience and evidence of exceptional ability
Contributions to ML serving frameworks, compilers, and frameworks is a bonus (e.g. SGLang, LLVM, PyTorch, MLIR)

Compensation and Benefits

Benefits

Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits at day 1 of hire:

Medical plans > plan options with $0 payroll deduction
Family-building, fertility, adoption and surrogacy benefits
Dental (including orthodontic coverage) and vision plans, both have options with a $0 paycheck contribution
Company Paid (Health Savings Accounts) HSA Contribution when enrolled in the High-Deductible medical plan with HSA
Healthcare and Dependent Care Flexible Spending Accounts (FSA)
401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits
Company paid Basic Life, AD&D
Short-term and long-term disability insurance (90 day waiting period)
Employee Assistance Program
Sick and Vacation time (Flex time for salary positions, Accrued hours for Hourly positions), and Paid Holidays
Back-up childcare and parenting support resources
Voluntary benefits to include: critical illness, hospital indemnity, accident insurance, theft & legal services, and pet insurance
Weight Loss and Tobacco Cessation Programs
Tesla Babies program
Commuter benefits
Employee discounts and perks program

Expected Compensation

$176,000 - $420,000/annual salary + cash and stock awards + benefits

Pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The total compensation package for this position may also include other elements dependent on the position offered. Details of participation in these benefit plans will be provided if an employee receives an offer of employment.

Responsibilities

Implement, optimize, and profile highly performant kernels for inference and training on Tesla's AI and Dojo ASICs
Optimize bottlenecks in the inference flow, make precision/performance tradeoff decisions, and develop novel techniques to improve hardware utilization and throughput
Work on a variety of edge and datacenter workloads, from small encoders/decoders to distributed LLM inference
Work with hardware teams to shape the next generation of Tesla hardware, evaluating architectural tradeoffs and balancing performance with versatility
Research and implement state-of-the-art machine learning techniques to achieve high performance on our hardware
Experiment with numerical methods and alternative architectures
Collaborate with the compiler team on programmability and performance

Requirements

Hands-on experience with kernel optimization, distributed systems, inference runtimes, or serving frameworks
Experience with LLMs, transformers, state space models, diffusion models, CNNs, and their performance characteristics
Familiarity with performance modeling and roofline analysis
Understanding of computer and GPU architecture, SIMD, multithreading, and/or other accelerators with vectorized instructions
Strong analytical and debugging skills
Ability to work across team boundaries with compiler, hardware, and ML teams
Degree in Engineering, Computer Science, or equivalent in experience and evidence of exceptional ability
Contributions to ML serving frameworks, compilers, and frameworks is a bonus (e.g. SGLang, LLVM, PyTorch, MLIR)

Compensation and Benefits

Benefits

Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits at day 1 of hire:

Medical plans > plan options with $0 payroll deduction
Family-building, fertility, adoption and surrogacy benefits
Dental (including orthodontic coverage) and vision plans, both have options with a $0 paycheck contribution
Company Paid (Health Savings Accounts) HSA Contribution when enrolled in the High-Deductible medical plan with HSA
Healthcare and Dependent Care Flexible Spending Accounts (FSA)
401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits
Company paid Basic Life, AD&D
Short-term and long-term disability insurance (90 day waiting period)
Employee Assistance Program
Sick and Vacation time (Flex time for salary positions, Accrued hours for Hourly positions), and Paid Holidays
Back-up childcare and parenting support resources
Voluntary benefits to include: critical illness, hospital indemnity, accident insurance, theft & legal services, and pet insurance
Weight Loss and Tobacco Cessation Programs
Tesla Babies program
Commuter benefits
Employee discounts and perks program

Expected Compensation

$176,000 - $420,000/annual salary + cash and stock awards + benefits

Kernel Optimization Software Engineer, AI Hardware

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Responsibilities

Requirements

Compensation and Benefits

Responsibilities

Requirements

Compensation and Benefits