Staff Software Engineer, Capacity Engineering

Pinterest Pinterest · Consumer · San Francisco, CA · Infrastructure and SRE

Staff Software Engineer, Capacity Engineering role focused on improving the efficiency and performance of large-scale cloud-native infrastructure, including Kubernetes and distributed systems. The role involves building optimization capabilities and collaborating with Infrastructure Engineering and SRE teams. A key aspect is leveraging AI to accelerate performance investigations, build self-serve tooling, and iterate on optimization approaches, while also critically evaluating AI-assisted work.

What you'd actually do

  1. Improve the efficiency of large scale shared environments like Kubernetes
  2. Improve the performance and efficiency of large scale distributed systems that drive Pinterest systems
  3. Build develop and mature profiling and optimization capabilities for Pinterest scale
  4. Collaborate with Infrastructure Engineering and SRE teams in their mission to deliver highly available, resilient, secure and efficient foundations for Pinterest’s tech stack
  5. Leverage AI to scale the impact of yourself and the team, including: - Accelerate performance investigations (e.g. quickly distill logs/metrics/traces and prior learnings) while verifying findings through measurement and testing - Build tooling and agents that allow users to self-serve efficiency insights and recommendations - Iterate faster on optimization approaches and rollout plans, then validate impact with experiments and production guardrails

Skills

Required

  • Bachelor’s degree in computer science, a related field or equivalent experience
  • Deep understanding of infrastructure capacity and performance
  • Experience leading efficiency initiatives at scale on Kubernetes or other large scale shared infrastructure
  • Strong technical and performance engineering skills to collaborate with stakeholders on complex and ambiguous technical challenges
  • Experience building and managing highly available distributed applications at scale
  • Proficiency in software development languages such as Java, Python and C++
  • Excellent skills in communicating complex technical issues
  • Experience with AWS or similar cloud environments
  • Demonstrated ability to use AI to improve speed and quality in your day-to-day workflow for relevant outputs
  • Strong track record of critical evaluation and verification of AI-assisted work (e.g., testing, source-checking, data validation, peer review)
  • High integrity and ownership: you protect sensitive data, avoid over-reliance on AI, and remain accountable for final decisions and deliverables

Nice to have

  • Hands-on experience with large, cloud-native multi-tenant platforms at Internet scale

What the JD emphasized

  • strong background in implementing performance and efficiency projects on large scale distributed systems
  • Demonstrated ability to use AI to improve speed and quality in your day-to-day workflow for relevant outputs
  • Strong track record of critical evaluation and verification of AI-assisted work (e.g., testing, source-checking, data validation, peer review)
  • High integrity and ownership: you protect sensitive data, avoid over-reliance on AI, and remain accountable for final decisions and deliverables