What you'd actually do

Architect and own large-scale distributed systems and platform infrastructure components, driving end-to-end technical design from requirements through production

Lead the design and implementation of high-performance, low-latency systems with well-defined service level objectives, dashboards, and incident response runbooks

Identify systemic reliability risks, reduce failure surface across service dependencies, and drive resiliency testing including overload and outage scenarios

Establish and enforce systems engineering best practices across the team, including safe rollout strategies, feature flagging, staged releases, and automated deployment pipelines

Use instrumentation and profiling to identify performance bottlenecks, establish visibility into key system metrics, and drive measurable improvements in throughput and latency

Skills

Required

designing and implementing large-scale distributed systems or platform infrastructure software
owning system reliability end-to-end
leading technical design of complex systems
performance profiling, capacity planning, and optimization of high-throughput or low-latency systems
systems programming in C, C++, Rust, or similar low-level languages
integrating AI tools to optimize/redesign workflows
applying AI-assisted development workflows to systems engineering problems
building and improving developer tooling, automation frameworks, or internal platform abstractions

Nice to have

prompt/context engineering
agent orchestration
responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy reviews)
contributing to or defining organization-wide systems engineering standards, coding guidelines, or reliability frameworks

What the JD emphasized

8+ years of experience designing and implementing large-scale distributed systems or platform infrastructure software

Experience owning system reliability end-to-end

Experience leading technical design of complex systems

Experience with performance profiling, capacity planning, and optimization of high-throughput or low-latency systems

Demonstrated ongoing AI skill development (e.g., prompt/context engineering, agent orchestration) and staying current with emerging AI technologies

Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)

Experience adhering to and implementing responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy reviews)

Experience with systems programming in C, C++, Rust, or similar low-level languages in a production environment at scale

Background in contributing to or defining organization-wide systems engineering standards, coding guidelines, or reliability frameworks

Experience applying AI-assisted development workflows to systems engineering problems, including code generation, anomaly detection, or automated root cause analysis

Demonstrated ability to build and improve developer tooling, automation frameworks, or internal platform abstractions that measurably improve engineering efficiency across teams

Meta is seeking a Staff Systems Engineer to design and build the foundational software infrastructure that powers products used by billions of people worldwide. In this role, you will architect and own large-scale distributed systems, low-level platform software, and critical infrastructure components that underpin Meta's family of applications and services. You will drive technical strategy across system reliability, performance, and scalability, partnering closely with product engineering, infrastructure, and operations teams to deliver systems that are resilient, efficient, and built to evolve. This is a high-impact opportunity to shape the systems engineering culture and technical direction at one of the world's most complex software organizations.

Responsibilities

Architect and own large-scale distributed systems and platform infrastructure components, driving end-to-end technical design from requirements through production Lead the design and implementation of high-performance, low-latency systems with well-defined service level objectives, dashboards, and incident response runbooks Identify systemic reliability risks, reduce failure surface across service dependencies, and drive resiliency testing including overload and outage scenarios Establish and enforce systems engineering best practices across the team, including safe rollout strategies, feature flagging, staged releases, and automated deployment pipelines Use instrumentation and profiling to identify performance bottlenecks, establish visibility into key system metrics, and drive measurable improvements in throughput and latency Collaborate with cross-functional partners across product engineering, data infrastructure, and operations to align on technical requirements and deliver scalable platform solutions Proactively incorporate privacy, security, and integrity principles into system design at early engineering stages, partnering with relevant teams to apply appropriate safeguards Mentor other engineers on systems design patterns, debugging methodologies, and AI-accelerated development workflows, and contribute to onboarding and engineering programs Drive roadmapping and technical strategy for one or more platform areas, communicating trade-offs and architectural decisions clearly to both engineering and non-engineering stakeholders Leverage AI tools and automation to accelerate development velocity, reduce toil, and improve the reliability and observability of owned systems

Qualifications

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience 8+ years of experience designing and implementing large-scale distributed systems or platform infrastructure software Experience owning system reliability end-to-end, including defining service level objectives, building observability tooling, and leading incident response and retrospectives Experience leading technical design of complex systems, including evaluating architectural trade-offs and driving cross-team alignment on implementation decisions Experience with performance profiling, capacity planning, and optimization of high-throughput or low-latency systems Track record of successfully delivering major infrastructure initiatives, including coordinating rollouts, migrations, and dependency management across multiple teams Demonstrated ongoing AI skill development (e.g., prompt/context engineering, agent orchestration) and staying current with emerging AI technologies Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements) Experience adhering to and implementing responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy reviews) Experience with systems programming in C, C++, Rust, or similar low-level languages in a production environment at scale Background in contributing to or defining organization-wide systems engineering standards, coding guidelines, or reliability frameworks Experience applying AI-assisted development workflows to systems engineering problems, including code generation, anomaly detection, or automated root cause analysis Demonstrated ability to build and improve developer tooling, automation frameworks, or internal platform abstractions that measurably improve engineering efficiency across teams