Software Engineer, Systems

Meta Meta · Big Tech · Menlo Park, CA +1

Staff Systems Engineer to design and build foundational software infrastructure for large-scale distributed systems, low-level platform software, and critical infrastructure components. Focus on reliability, performance, and scalability, partnering with product engineering, infrastructure, and operations. Drive technical strategy, establish best practices, and mentor engineers. Leverage AI tools to accelerate development and improve system reliability and observability.

What you'd actually do

  1. Architect and own large-scale distributed systems and platform infrastructure components, driving end-to-end technical design from requirements through production
  2. Lead the design and implementation of high-performance, low-latency systems with well-defined service level objectives, dashboards, and incident response runbooks
  3. Identify systemic reliability risks, reduce failure surface across service dependencies, and drive resiliency testing including overload and outage scenarios
  4. Establish and enforce systems engineering best practices across the team, including safe rollout strategies, feature flagging, staged releases, and automated deployment pipelines
  5. Use instrumentation and profiling to identify performance bottlenecks, establish visibility into key system metrics, and drive measurable improvements in throughput and latency

Skills

Required

  • designing and implementing large-scale distributed systems or platform infrastructure software
  • owning system reliability end-to-end
  • leading technical design of complex systems
  • performance profiling, capacity planning, and optimization of high-throughput or low-latency systems
  • systems programming in C, C++, Rust, or similar low-level languages
  • integrating AI tools to optimize/redesign workflows
  • applying AI-assisted development workflows to systems engineering problems
  • building and improving developer tooling, automation frameworks, or internal platform abstractions

Nice to have

  • prompt/context engineering
  • agent orchestration
  • responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy reviews)
  • contributing to or defining organization-wide systems engineering standards, coding guidelines, or reliability frameworks

What the JD emphasized

  • 8+ years of experience designing and implementing large-scale distributed systems or platform infrastructure software
  • Experience owning system reliability end-to-end
  • Experience leading technical design of complex systems
  • Experience with performance profiling, capacity planning, and optimization of high-throughput or low-latency systems
  • Demonstrated ongoing AI skill development (e.g., prompt/context engineering, agent orchestration) and staying current with emerging AI technologies
  • Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)
  • Experience adhering to and implementing responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy reviews)
  • Experience with systems programming in C, C++, Rust, or similar low-level languages in a production environment at scale
  • Background in contributing to or defining organization-wide systems engineering standards, coding guidelines, or reliability frameworks
  • Experience applying AI-assisted development workflows to systems engineering problems, including code generation, anomaly detection, or automated root cause analysis
  • Demonstrated ability to build and improve developer tooling, automation frameworks, or internal platform abstractions that measurably improve engineering efficiency across teams