What you'd actually do

Explore, co-design and optimize parallelisms, compute efficiency, distributed training/inference paradigms and algorithms to improve the scalability, efficiency, and reliability of GenAI systems

Innovate and co-design novel model deployment techniques for sustained scaling and hardware efficiency during GenAI serving

Benchmark, analyze, model, and project the performance of AI workloads against a wide range of what-if scenarios and provide early input to the design of future hardware, models, and runtime, giving crucial feedback to the architecture, compiler, kernel, modeling, and runtime teams

Explore, prototype and productionize highly optimized ML kernels to unlock full potential of current and future accelerators for Meta’s AI workloads

Influence the hardware roadmap of Meta’s custom AI accelerators

Skills

Required

PhD in Computer Science, Electrical Engineering, Applied Mathematics, or a related technical field, OR a Master's degree with 3+ years of relevant industry experience
Proven research experience in hardware-aware model enablement, performance modeling of AI systems or prevailing accelerators/silicon architectures
Hands-on proficiency with end-to-end AI hardware architecture or on-device mapping algorithm development
Theoretical background and practical experience with AI models (e.g., CNNs, Transformers, LLMs, Diffusion models)
Experience in system-level performance analysis, profiling, and benchmarking of AI workloads
In-depth experience of Python
Experience with at least one major AI framework
Track record of publishing research papers at peer-reviewed conferences or journals
Experience communicating technical results to cross-functional stakeholders
Experience with deploying AI agents/prevalining techniques for increased efficiency
Experience or knowledge of training/inference of large-scale deep learning models
Familiarity with low-level programming for specialized hardware (e.g., CUDA, HIP, Triton) or hardware description languages (HDL)
Experience or knowledge of distributed ML systems and algorithm development
Experience or knowledge of either Generative AI models such as LLMs/LDMs or Ranking & Recommendation models such as DLRM or equivalent

Nice to have

Experience with AI & Systems Co-design

Our teams’ mission is to explore, develop and help productionize high performance software & hardware technologies for AI at datacenter scale. We achieve this via concurrent design and optimization of many aspects of the system from models and runtime all the way to the AI hardware, optimizing across compute, network and storage. The team invests significantly into model optimization on existing accelerator systems and guiding the future of models and AI HW at Meta. This drives improved performance, new model architectures and reduces cost of ownership for all key AI services at Meta: Recommendations and Generative AI. This is an exciting space that spans exploration and productionization, coupled with close collaborations with industry, academia, Meta’s Infrastructure and Product groups. Collaborating closely with product teams, the team's mode of operation is going from ideation and rapid prototyping, all the way to assisting productization of high leverage ideas, working with many partner teams to bring learnings from prototype into production. In addition to the real-world impact on billions of users of the Meta products, our team members have won Best Paper Awards at prestigious conferences such as ISCA, ASPLOS, SOSP, and OSDI, with multiple papers selected for IEEE Micro Top Picks. We regularly publish in ICML, NeurIPS, SC, HPCA, NSDI, VLDB, MLSys, and more. Overall, our work largely corresponds to the research communities of systems in general and especially systems for ML (MLSys, SOSP, OSDI, SIGCOMM, NSDI), hardware architecture (ISCA, ASPLOS), ML (NeurIPS, ICML, ICLR) and supercomputing (SC, ICS).

Responsibilities

Explore, co-design and optimize parallelisms, compute efficiency, distributed training/inference paradigms and algorithms to improve the scalability, efficiency, and reliability of GenAI systems Innovate and co-design novel model deployment techniques for sustained scaling and hardware efficiency during GenAI serving Benchmark, analyze, model, and project the performance of AI workloads against a wide range of what-if scenarios and provide early input to the design of future hardware, models, and runtime, giving crucial feedback to the architecture, compiler, kernel, modeling, and runtime teams Explore, prototype and productionize highly optimized ML kernels to unlock full potential of current and future accelerators for Meta’s AI workloads Influence the hardware roadmap of Meta’s custom AI accelerators Lead cross-functional initiatives spanning multiple engineering organizations to drive high-impact technical milestones Guide Meta’s AI HW requirements and design focusing on performance at System and Silicon levels. Co-design and optimize our AI HW and related software stack for Meta’s future workloads, with technology pathfinding and evaluation of cutting-edge AI systems

Qualifications

Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta PhD in Computer Science, Electrical Engineering, Applied Mathematics, or a related technical field, OR a Master's degree with 3+ years of relevant industry experience Proven research experience in one or more of the following areas: hardware-aware model enablement, performance modeling of AI systems or prevailing accelerators/silicon architectures Hands-on proficiency with end-to-end AI hardware architecture or on-device mapping algorithm development, encompassing logic, architecture, and optimizations for performance, power, and area (Power, Performance, and Area) (PPA) Theoretical background and practical experience with AI models (e.g., CNNs, Transformers, LLMs, Diffusion models) Experience in system-level performance analysis, profiling, and benchmarking of AI workloads In-depth experience of Python and experience with at least one major AI framework Track record of publishing research papers at peer-reviewed conferences or journals, and experience communicating technical results to cross-functional stakeholders Experience with deploying AI agents/prevalining techniques for increased efficiency Experience or knowledge of training/inference of large-scale deep learning models Familiarity with low-level programming for specialized hardware (e.g., CUDA, HIP, Triton) or hardware description languages (HDL) Experience or knowledge of distributed ML systems and algorithm development Experience or knowledge of either Generative AI models such as LLMs/LDMs or Ranking & Recommendation models such as DLRM or equivalent

Research Scientist, AI & Systems Co-design (phd)

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Responsibilities

Qualifications

Responsibilities

Qualifications