AI Research Scientist, Sysml - Fair

Meta · Big Tech · Menlo Park, CA +2

Research Engineer role focused on advancing the field of AI by making fundamental advances in systems and infrastructure for large-scale machine learning, with a focus on enabling distributed training, resource efficiency, and hardware-software co-design. The role involves cutting-edge research, publishing results, and impacting Meta's product development.

What you'd actually do

Carry out cutting-edge research to advance the science and technology of machine learning systems
Perform research that enables learning the semantics of data (images, video, text, audio, and other modalities)
Contribute research that leads to innovations in: scalable machine learning systems, resource-efficient AI data and algorithm scaling and neural network architectures, memory and energy-efficient AI systems, environmentally-sustainable AI system and hardware designs
Devise better data-driven models of AI system design and optimization
Collaborate with researchers and cross-functional partners including communicating research plans, progress, and results

Skills

Required

Python
C++
C
Rust
PyTorch
systems
computer architectures
compiler and programming languages
machine learning
artificial intelligence
developing and optimizing systems for at-scale machine learning execution
devising data-driven models and real-system experiments and design implementation for AI system optimization
scalable machine learning systems
resource-efficient AI data and algorithm scaling
neural network architectures
solving complex problems
comparing alternative solutions, tradeoffs, and different perspectives to determine a path forward
working and communicating cross functionally in a team environment

Nice to have

PhD degree
cuBLAS
cuDNN
FlashAttention

What the JD emphasized

unprecedented scale
human-level intelligence
open science innovations
usability, efficiency, and sustainability as design principles
enabling distributed training at an unprecedented scale
advancements and development in training library and authoring components
training performance acceleration through hardware-software co-design
learning the semantics of data (images, video, text, audio, and other modalities)
scalable machine learning systems
resource-efficient AI data and algorithm scaling
neural network architectures
memory and energy-efficient AI systems
environmentally-sustainable AI system and hardware designs
data-driven models of AI system design and optimization
Publish research results
impacts Meta product development
equivalent practical experience
systems, computer architectures, compiler and programming languages, machine learning, and artificial intelligence
developing and optimizing systems for at-scale machine learning execution
devising data-driven models and real-system experiments and design implementation for AI system optimization
solving complex problems
comparing alternative solutions, tradeoffs, and different perspectives to determine a path forward
working and communicating cross functionally in a team environment
Proven track record of achieving significant results and publications
publications at leading workshops, journals or conferences such as MLSys, ISCA, ASPLOS, HPCA, PLDI, CGO, NeurIPS, ICML, ICLR, or similar
Demonstrated research and software engineering experience via work experience, coding competitions, or widely used contributions in open source repositories (e.g. GitHub)

Other signals

advancing the field of artificial intelligence
making fundamental advances in technologies to help interact with and understand our world
advancing the state of AI through open science innovations
explore, design, and build ML systems and infrastructures at scale

Read full job description

Meta is seeking Research Engineers to join Fundamental AI Research (FAIR). We are committed to advancing the field of artificial intelligence by making fundamental advances in technologies to help interact with and understand our world. We are seeking individuals who are experienced in solving systems challenges to sustainably accelerate our reach to human-level intelligence. Candidates will have an opportunity to make fundamental advances in systems and apply their ideas at an unprecedented scale. The mission of Meta FAIR's SysML research is to advance the state of AI through open science innovations. We explore, design, and build ML systems and infrastructures at scale with usability, efficiency, and sustainability as design principles. Some aspects of this role include enabling distributed training at an unprecedented scale through advancements and development in training library and authoring components, such as cuBLAS, cuDNN, FlashAttention, training performance acceleration through hardware-software co-design.

Responsibilities

Carry out cutting-edge research to advance the science and technology of machine learning systems Perform research that enables learning the semantics of data (images, video, text, audio, and other modalities) Contribute research that leads to innovations in: scalable machine learning systems, resource-efficient AI data and algorithm scaling and neural network architectures, memory and energy-efficient AI systems, environmentally-sustainable AI system and hardware designs Devise better data-driven models of AI system design and optimization Collaborate with researchers and cross-functional partners including communicating research plans, progress, and results Publish research results and contribute to research that impacts Meta product development

Qualifications

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience PhD degree in Computer Science, Computer Engineering, a relevant technical field, & 2+ years of equivalent domain-specific industry experience Development experience in systems, computer architectures, compiler and programming languages, machine learning, and artificial intelligence Experience with Python, C++, C, Rust or other related languages and with PyTorch framework Experience developing and optimizing systems for at-scale machine learning execution Experience devising data-driven models and real-system experiments and design implementation for AI system optimization Experience with scalable machine learning systems, resource-efficient AI data and algorithm scaling, or neural network architectures Experience solving complex problems and comparing alternative solutions, tradeoffs, and different perspectives to determine a path forward Experience working and communicating cross functionally in a team environment Proven track record of achieving significant results and publications as demonstrated by grants, fellowships, patents, as well as publications at leading workshops, journals or conferences such as MLSys, ISCA, ASPLOS, HPCA, PLDI, CGO, NeurIPS, ICML, ICLR, or similar Demonstrated research and software engineering experience via work experience, coding competitions, or widely used contributions in open source repositories (e.g. GitHub)