(usa) Staff, Software Engineer

Walmart Walmart · Retail · Bentonville, AR

The role focuses on building an operational intelligence platform (Watchtower) using AI and event-driven systems to transform incident response and error correction at scale. The engineer will design and develop AI agents that learn from incidents to prevent future failures, generate summaries, and suggest actions, as well as create event-driven systems for real-time orchestration of incident response.

What you'd actually do

  1. Design and build the next generation of incident management, from automated workflow orchestration to AI powered post incident analysis
  2. Create event driven systems that orchestrate incident response in real time, including live chat integration, intelligent stakeholder routing, and dependency health monitoring
  3. Build the eventing backbone that connects disparate systems, enabling rapid response when critical failures occur
  4. Develop AI agents that learn from every incident to prevent future failures, generate instant incident summaries, and suggest corrective actions based on historical patterns
  5. Design real time streaming systems that process operational data with low latency

Skills

Required

  • Deep experience across the full stack
  • demonstrated expertise in distributed systems
  • Strong background building real time, event driven architectures at scale
  • Track record of building high reliability platforms with streaming technologies
  • Hands on experience with modern TypeScript/React applications
  • Expertise in data systems (PostgreSQL, Redis)
  • real time processing patterns
  • Experience integrating AI/ML systems into production environments
  • leading technical initiatives

Nice to have

  • Master’s degree in Computer Science, Computer Engineering, Computer Information Systems, Software Engineering, or related area
  • 2 years' experience in software engineering or related area
  • creating inclusive digital experiences
  • implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards
  • assistive technologies
  • integrating digital accessibility seamlessly
  • accessibility best practices

What the JD emphasized

  • AI powered post incident analysis
  • AI agents
  • integrating AI/ML systems into production environments

Other signals

  • AI agents
  • event driven systems
  • operational intelligence platform
  • incident response
  • error correction