Staff, Software Engineer

Walmart Walmart · Retail · Bangalore, KA, India

Staff Software Engineer on the Logging Platform team within Observability at Walmart, focused on architecting and enhancing observability solutions for log analysis, anomaly detection, and predictive monitoring at scale. The role involves designing backend services, setting technical direction, championing observability tools, and mentoring engineers.

What you'd actually do

  1. Lead the design and architecture of high-performance, cloud-native backend services and APIs (Java, Golang, or Rust) in a microservices environment, ensuring scalability, reliability, and security at enterprise scale.
  2. Set technical direction and best practices for end-to-end service ownership, including architecture, implementation, testing, deployment, and operational excellence (SLOs, reliability, scalability).
  3. Champion observability and performance engineering by advancing instrumentation and monitoring using OpenTelemetry, Prometheus, and Grafana, driving incident reduction and system resilience.
  4. Collaborate with engineering, data science, and business leaders to align technical solutions with organizational goals, and communicate complex technical concepts to diverse audiences.
  5. Mentor and guide senior engineers, fostering a culture of technical excellence, innovation, and continuous learning.

Skills

Required

  • Java, Golang, or Rust
  • cloud-native development
  • microservices architecture
  • backend service design
  • API design
  • distributed systems
  • observability tools (OpenTelemetry, Prometheus, Grafana)
  • system monitoring
  • performance tuning
  • service ownership
  • architecture
  • implementation
  • testing
  • deployment
  • operational excellence (SLOs, reliability, scalability)
  • problem-solving
  • decision-making
  • analytical skills
  • communication skills
  • collaboration skills
  • mentoring senior engineers

Nice to have

  • Rust

What the JD emphasized

  • large-scale, distributed systems
  • platform architecture
  • high-performance backend services and APIs
  • cloud-native, microservices environments
  • end-to-end service ownership
  • observability tools
  • system monitoring and performance tuning
  • diagnosing complex issues
  • driving effective solutions
  • communication and collaboration skills
  • influence and align stakeholders