Technical Program Manager, Core Infrastructure

Meta Meta · Big Tech · Menlo Park, CA

Technical Program Manager to lead large-scale projects focused on advancing language model scaling infrastructure. This role involves collaborating across engineering, hardware, data center, research, and product teams to design, build, and scale foundational hardware and software systems supporting AI innovation. Responsibilities include driving the end-to-end integration of AI hardware and core infra, developing frameworks for onboarding, managing cross-functional dependencies, and streamlining workflows. Requires extensive experience in technical program management, understanding of AI hardware/software development, and knowledge of LLMs and distributed systems.

What you'd actually do

  1. Establish and lead effective program teams to ensure alignment and achieve common objectives
  2. Work closely with engineering, data center, hardware and business stakeholders to define program requirements, prioritize initiatives, and establish scope, including shaping the roadmap and long-term strategy for partner teams
  3. Create and implement communication strategies to proactively share program status, challenges, and risks with stakeholders
  4. Drive successful outcomes by actively managing cross-functional dependencies, mitigating risks, and adjusting scope, timeline, and resources as needed
  5. Collaborate with cross-functional teams to lead the end-to-end lifecycle of programs, including technical analysis, design, development, testing, implementation, and post-launch support

Skills

Required

  • 12+ years of experience in software engineering, hardware engineering, systems engineering, or technical product/program management
  • Knowledge of software and hardware development for large scale hardware readiness, including end-to-end product development processes
  • Experience delivering complex technology programs and products from inception through to successful delivery
  • Experience defining and optimizing engineering processes at scale
  • Experience building work relationships across multi-disciplinary teams and with partners in different time zones
  • Knowledge of Large Language Model and machine learning, and scaling distributed systems
  • Proven commitment to scale infrastructure for large scale AI distributed compute systems

Nice to have

  • Demonstrated ongoing AI skill development (e.g., prompt/context engineering, agent orchestration) and staying current with emerging AI technologies
  • Experience adhering to and implementing responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy reviews)
  • Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)

What the JD emphasized

  • lead complex, large-scale projects focused on advancing language model scaling
  • driving the end-to-end integration of new AI hardware and core infra stack
  • scale foundational hardware, software systems, and tools that support Meta’s AI innovation
  • scale infrastructure for large scale AI distributed compute systems

Other signals

  • driving the end-to-end integration of new AI hardware and core infra stack
  • scale foundational hardware, software systems, and tools that support Meta’s AI innovation
  • scale infrastructure for large scale AI distributed compute systems