(usa) Staff, Software Engineer

Walmart Walmart · Retail · Sunnyvale, CA

Staff Software Engineer role focused on designing and delivering scalable platform capabilities with a focus on cloud-native architecture and AI/ML integration. The role involves leading the full software development lifecycle, providing technical leadership, and mentoring engineering teams. The team focuses on enhancing system reliability through controlled experimentation and fault injection, developing and operating frameworks for safety-scoped tests linked to reliability outcomes.

What you'd actually do

  1. Deliver Chaos platform capabilities including infrastructure, tooling, and ML frameworks aligned with roadmap milestones and domain priorities.
  2. Apply cloud-native architecture principles to ensure scalable, resilient, and agile systems.
  3. Lead full software development lifecycle activities including coding, testing, deployment, monitoring, and maintenance.
  4. Conduct system design reviews to validate feasibility and ensure quality and alignment with architectural standards.
  5. Build AI-driven applications and integrate ML components to support automation and intelligent features.

Skills

Required

  • Cloud-native expertise: microservices, containers, and automation to build scalable, resilient systems.
  • Strong SRE foundation: SLOs/SLIs, error budgets, reliability KPIs, and disciplined incident/postmortem practices.
  • Observability depth: metrics/logs/traces, alert quality, dashboards, and operational readiness to reduce risk and toil.
  • Proven SDLC ownership: design, coding, testing, CI/CD, deployment, monitoring, and maintenance.
  • Platform engineering skills: API design, platform capabilities, and prototyping to validate solutions quickly.
  • Strong debugging and problem-solving: root cause analysis and durable corrective actions.
  • Chaos engineering strength: hypothesis-driven experiments, fault injection, and safety controls (blast radius, rollback, kill switches).
  • Distributed systems expertise: dependency mapping, failure-mode analysis, performance tuning, resilience patterns.
  • Leadership: mentor engineers and influence architecture toward sustainable reliability.
  • Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 4 years’ experience in software engineering or related area OR 6 years’ experience in software engineering or related area.

Nice to have

  • Master’s degree in Computer Science, Computer Engineering, Computer Information Systems, Software Engineering, or related area and 2 years' experience in software engineering or related area

What the JD emphasized

  • ML frameworks
  • AI-driven applications
  • integrate ML components

Other signals

  • AI/ML integration
  • ML frameworks
  • AI-driven applications
  • integrate ML components