What you'd actually do

Lead and mentor a team of ML Infrastructure engineers responsible for building and scaling the systems that power Snap's model training, inference, and data pipelines

Set the strategy, build a roadmap, create measurable goals, and lead your team to deliver high-impact ML infrastructure initiatives

Evaluate the technical tradeoffs of key decisions and serve as a strong technical mentor across the team

Perform design and code reviews to continuously raise the technical excellence bar

Collaborate with ML engineers, product teams, and cross-functional stakeholders to understand requirements, evaluate tradeoffs, and deliver solutions at scale

Skills

Required

Strong understanding of ML infrastructure systems including model training platforms, inference serving, feature stores, and data pipelines
Background building high availability, mission-critical systems at significant scale
Experience setting technical direction for teams whose work directly enables ML engineers and production ML systems
Strong management and mentorship skills, with the ability to lead and grow senior engineers
Excellent verbal and written communication skills, with high attention to detail
Ability to collaborate with internal and external stakeholders at all levels
Skilled at managing ambiguous problems and driving clarity across complex, multi-team initiatives
Proficiency in, or a strong aptitude for, leveraging AI tools to streamline development, paired with the critical judgment to audit generated output for architectural integrity, performance bottlenecks, and security risks
Adaptability in learning and applying evolving AI systems and tools to remain at the forefront of engineering trends and modern development practices
Bachelor's degree in a technical field such as computer science or equivalent years of experience
9+ years of post-Bachelor's software engineering experience; or a Master's degree in a technical field + 8+ years of post-grad experience; or a PhD in a related technical field + 5+ years of post-grad experience
1+ year(s) of experience managing an engineering team
Experience with distributed systems and large-scale ML infrastructure

Nice to have

Advanced degree in a related technical field
Experience working with ML training platforms, inference infrastructure, or feature serving systems
Familiarity with ML frameworks such as TensorFlow, PyTorch, Caffe2, Spark ML, or related frameworks
Experience with Spark, Flink, Ray, or other big data processing technologies
Experience with key infrastructure technologies including Kubernetes, NoSQL, Memcache/Redis, Kafka, Google Cloud, or AWS services
Track record of delivery in rapidly changing, highly collaborative, multi-stakeholder environments
Experience with MLOps and managing production machine learning lifecycle

Snap Inc is a technology company. We believe the camera presents the greatest opportunity to improve the way people live and communicate. Snap contributes to human progress by empowering people to express themselves, live in the moment, learn about the world, and have fun together.

The Company operates Snapchat, a visual messaging app that enhances your relationships with friends, family, and the world, and Specs Inc., a wholly-owned subsidiary dedicated to making computing more human, in addition to Bitmoji, Saturn, and other digital services.

Snap Engineering teams build fun and technically sophisticated products that reach hundreds of millions of Snapchatters around the world, every day. We're deeply committed to the well-being of everyone in our global community, which is why our values are at the root of everything we do. We move fast, with precision, and always execute with privacy at the forefront.

We're looking for a Manager, Software Engineering, ML Inference to join Snap Inc.!

What you'll do:

Lead and mentor a team of ML Infrastructure engineers responsible for building and scaling the systems that power Snap's model training, inference, and data pipelines
Set the strategy, build a roadmap, create measurable goals, and lead your team to deliver high-impact ML infrastructure initiatives
Evaluate the technical tradeoffs of key decisions and serve as a strong technical mentor across the team
Perform design and code reviews to continuously raise the technical excellence bar
Collaborate with ML engineers, product teams, and cross-functional stakeholders to understand requirements, evaluate tradeoffs, and deliver solutions at scale
Hire, grow, and retain high-performing engineers by creating growth opportunities, giving regular feedback, and managing performance
Advocate for and apply best practices when it comes to availability, scalability, operational excellence, and cost management
Utilize AI tools and high velocity engineering workflows to design and ship scalable services while upholding rigorous standards for code correctness, security, and production-ready quality

Knowledge, Skills & Abilities:

Strong understanding of ML infrastructure systems including model training platforms, inference serving, feature stores, and data pipelines
Background building high availability, mission-critical systems at significant scale
Experience setting technical direction for teams whose work directly enables ML engineers and production ML systems
Strong management and mentorship skills, with the ability to lead and grow senior engineers
Excellent verbal and written communication skills, with high attention to detail
Ability to collaborate with internal and external stakeholders at all levels
Skilled at managing ambiguous problems and driving clarity across complex, multi-team initiatives
Proficiency in, or a strong aptitude for, leveraging AI tools to streamline development, paired with the critical judgment to audit generated output for architectural integrity, performance bottlenecks, and security risks
Adaptability in learning and applying evolving AI systems and tools to remain at the forefront of engineering trends and modern development practices

Minimum Qualifications:

Bachelor's degree in a technical field such as computer science or equivalent years of experience
9+ years of post-Bachelor's software engineering experience; or a Master's degree in a technical field + 8+ years of post-grad experience; or a PhD in a related technical field + 5+ years of post-grad experience
1+ year(s) of experience managing an engineering team
Experience with distributed systems and large-scale ML infrastructure

Preferred Qualifications:

Advanced degree in a related technical field
Experience working with ML training platforms, inference infrastructure, or feature serving systems
Familiarity with ML frameworks such as TensorFlow, PyTorch, Caffe2, Spark ML, or related frameworks
Experience with Spark, Flink, Ray, or other big data processing technologies
Experience with key infrastructure technologies including Kubernetes, NoSQL, Memcache/Redis, Kafka, Google Cloud, or AWS services
Track record of delivery in rapidly changing, highly collaborative, multi-stakeholder environments
Experience with MLOps and managing production machine learning lifecycle

If you have a disability or special need that requires accommodation, please don’t be shy and provide us some information.

"Default Together" Policy at Snap: At Snap Inc. we believe that being together in person helps us build our culture faster, reinforce our values, and serve our community, customers and partners better through dynamic collaboration. To reflect this, we practice a “default together” approach and expect our team members to work in an office 4+ days per week.

At Snap, we believe that having a team of diverse backgrounds and voices working together will enable us to create innovative products that improve the way people live and communicate. Snap is proud to be an equal opportunity employer, and committed to providing employment opportunities regardless of race, religious creed, color, national origin, ancestry, physical disability, mental disability, medical condition, genetic information, marital status, sex, gender, gender identity, gender expression, pregnancy, childbirth and breastfeeding, age, sexual orientation, military or veteran status, or any other protected classification, in accordance with applicable federal, state, and local laws. EOE, including disability/vets.

We are an Equal Opportunity Employer and will consider qualified applicants with criminal histories in a manner consistent with applicable law (by example, the requirements of the San Francisco Fair Chance Ordinance and the Los Angeles Fair Chance Initiative for Hiring, where applicable).

Our Benefits: Snap Inc. is its own community, so we’ve got your back! We do our best to make sure you and your loved ones have everything you need to be happy and healthy, on your own terms. Our benefits are built around your needs and include paid parental leave, comprehensive medical coverage, emotional and mental health support programs, and compensation packages that let you share in Snap’s long-term success!

Compensation

In the United States, work locations are assigned a pay zone which determines the salary range for the position. The successful candidate’s starting pay will be determined based on job-related skills, experience, qualifications, work location, and market conditions. The starting pay may be negotiable within the salary range for the position.** **These pay zones may be modified in the future.

Zone A (CA, WA, NYC):

The base salary range for this position is $229,000-$343,000 annually.

Zone B:

The base salary range for this position is $218,000-$326,000 annually.

Zone C:

The base salary range for this position is $195,000-$292,000 annually.

This position is eligible for equity in the form of RSUs.