Staff, Software Engineer

Walmart Walmart · Retail · Sunnyvale, CA

Staff Software Engineer with deep expertise in Java and Spring Boot, Apache Kafka, and Apache Spark, responsible for designing and delivering large-scale backend platforms and data processing systems. Focuses on building highly scalable microservices, real-time event-driven systems, and optimizing data pipelines for high throughput, low latency, and extreme reliability.

What you'd actually do

  1. Design and build highly scalable backend microservices using Java and Spring Boot.
  2. Architect and implement real-time event-driven systems using Apache Kafka.
  3. Develop and optimize large-scale batch and streaming data pipelines using Apache Spark.
  4. Drive architecture decisions around scalability, resiliency, observability, and cost efficiency.
  5. Lead system design reviews and define engineering best practices for distributed systems.

Skills

Required

  • Java
  • Spring Boot
  • Apache Kafka
  • Apache Spark
  • distributed systems
  • microservices
  • event-driven architectures
  • cloud-native deployments (Kubernetes, Docker, AWS/GCP/Azure)
  • NoSQL / analytical data stores (Cassandra, BigQuery, HBase, or similar)
  • production debugging
  • performance tuning

Nice to have

  • Temporal.io
  • Cadence
  • Apache Airflow
  • Argo Workflows
  • AWS Step Functions
  • Event Sourcing
  • CQRS
  • Saga Pattern
  • Idempotency
  • Advanced Retry Policies
  • Rate Limiting
  • Quotas
  • DSL Design
  • SDK Development
  • Pulsar
  • RabbitMQ
  • Priority Queuing
  • retail
  • supply chain
  • pricing
  • ads
  • e-commerce platforms
  • real-time analytics
  • recommendation engines
  • fraud detection systems
  • CI/CD pipelines
  • observability (metrics/logging/tracing)
  • infrastructure as code
  • internal frameworks
  • platform engineering

What the JD emphasized

  • hands-on technical leader and system architect
  • high throughput, low latency, and extreme reliability
  • highly scalable backend microservices
  • real-time event-driven systems
  • large-scale batch and streaming data pipelines
  • scalability, resiliency, observability, and cost efficiency
  • engineering best practices for distributed systems
  • partitioning strategies, caching, async processing, and concurrency tuning
  • technical multiplier across multiple teams
  • long-term platform reliability improvements
  • Temporal.io, Cadence, Apache Airflow, or Argo Workflows
  • Distributed State Management & Durable Execution
  • Event Sourcing & CQRS
  • Saga Pattern
  • Fault Tolerance & High Availability
  • Idempotency Mastery
  • Advanced Retry Policies
  • Rate Limiting & Quotas
  • DSL Design
  • SDK Development
  • Kafka, Pulsar, or RabbitMQ
  • Priority Queuing
  • Temporal.io, Cadence, Apache Airflow, Argo Workflows, or AWS Step Functions
  • 12+ years of experience
  • Must-have strong hands-on experience in Java and Spring Boot
  • Deep expertise in Apache Kafka
  • Strong hands-on experience with Apache Spark
  • large scale (millions–billions of events / high TPS platforms)
  • event-driven microservices architectures
  • distributed systems fundamentals
  • cloud-native deployments (Kubernetes, Docker, AWS/GCP/Azure)
  • NoSQL / analytical data stores
  • production debugging and performance tuning skills