Staff Software Engineer, Growth Notifications, Level 6

Snap Snap · Consumer · Bellevue, WA +2

Staff Software Engineer to join Snap's Growth Notifications team. This role will focus on designing, implementing, and operating highly available backend services for a notification platform serving hundreds of millions of users. Responsibilities include owning the end-to-end architecture, leading technical direction, defining roadmaps, making architectural decisions, and collaborating across teams. The role emphasizes best practices in distributed systems, reliability, observability, cost efficiency, and incident management. Experience with large-scale backend services, mission-critical systems, system design, and observability is required. Preferred qualifications include experience with specific programming languages, NoSQL databases, cloud services, pub/sub systems, and notification/messaging platforms.

What you'd actually do

  1. Design, implement, and operate highly available backend services that power Snapchat’s growth notification platform across multiple communication channels, serving hundreds of millions of users.
  2. Own the end-to-end architecture for Snap’s Growth Notification systems, with a focus on reliability, observability, cost efficiency, and sustainable DAU/MAU and retention impact.
  3. Lead technical direction for the Growth Notifications team: define multi‑quarter roadmaps, make high‑quality architectural decisions, and drive large, ambiguous projects from concept through launch and iteration.
  4. Collaborate across teams to integrate upstream signals and downstream use cases into a coherent, scalable growth notifications platform.
  5. Advocate for and apply best practices in distributed systems, including SLIs/SLOs, incident management, cost management, and safe, iterative delivery in a high‑leverage, DAU‑critical system.

Skills

Required

  • Experience designing, building, and operating backend services or distributed systems at significant scale.
  • Proven track record of owning highly-available, mission‑critical systems, including on‑call participation, incident response, and driving systemic fixes.
  • Ability to set technical vision and lead complex, cross‑functional initiatives over multiple quarters, balancing architectural quality, reliability, and product velocity.
  • Strong foundation in system design (APIs, data models, storage, pub/sub, queues, and workflow orchestration) and performance/latency optimization.
  • Deep experience with observability (metrics, logging, tracing, dashboards) and using data to debug, harden, and evolve large-scale systems.
  • Excellent collaboration and communication skills; able to work effectively with Product, DS, ML, Design, and other engineering teams to align on requirements and trade‑offs.
  • Ability to mentor and uplevel engineers, provide clear technical guidance, and create structures that make the team more effective over time
  • 9+ years of post-Bachelor’s software development experience; or a Master’s degree in a technical field + 8+ year of post-grad software development experience; or a PhD in a related technical field + 5+ years of post-grad software development experience
  • Demonstrated track record of building and operating reliable, scalable services in cloud technologies with a strong focus on observability, cost efficiency, and incident response

Nice to have

  • Experience with one or more of: Java, Go, C++, and/or Python.
  • Experience with NoSQL data stores, caches (e.g., Memcache/Redis), and cloud services (Google Cloud, AWS, or similar).
  • Experience with pub/sub and task‑queue systems (e.g., Kafka, Google Pub/Sub, Cloud Tasks, or internal equivalents) in high‑throughput environments.
  • Experience with notification, messaging, growth, or experimentation platforms, including ranking, targeting, and A/B testing at scale.
  • Demonstrated ability to lead technical strategy for a team or domain, influencing architecture, reliability, and long‑term roadmaps across multiple teams.

What the JD emphasized

  • highly available backend services
  • hundreds of millions of users
  • end-to-end architecture
  • reliability
  • observability
  • cost efficiency
  • DAU/MAU and retention impact
  • technical direction
  • multi-quarter roadmaps
  • high-quality architectural decisions
  • large, ambiguous projects
  • integrate upstream signals and downstream use cases
  • scalable growth notifications platform
  • best practices in distributed systems
  • SLIs/SLOs
  • incident management
  • cost management
  • safe, iterative delivery
  • DAU-critical system
  • building and operating backend services or distributed systems at significant scale
  • highly-available, mission-critical systems
  • on-call participation
  • incident response
  • driving systemic fixes
  • set technical vision
  • lead complex, cross-functional initiatives
  • balancing architectural quality, reliability, and product velocity
  • system design
  • performance/latency optimization
  • observability (metrics, logging, tracing, dashboards)
  • using data to debug, harden, and evolve large-scale systems
  • mentor and uplevel engineers
  • clear technical guidance
  • structures that make the team more effective
  • building and operating reliable, scalable services in cloud technologies
  • strong focus on observability, cost efficiency, and incident response
  • notification, messaging, growth, or experimentation platforms
  • ranking, targeting, and A/B testing at scale
  • lead technical strategy for a team or domain
  • influencing architecture, reliability, and long-term roadmaps