Staff, Software Engineer

Walmart · Retail · Sunnyvale, CA

Staff Software Engineer to design and build enterprise-scale Marketplace platforms supporting Seller Risk. This role is backend-heavy, with strong expectations for Java-based distributed systems, while also providing technical leadership for React-based front-end applications. You will operate as a senior technical leader, driving architecture, system design, and engineering excellence across the full stack. Responsibilities include designing and building highly scalable backend microservices using Java and Spring Boot, architecting and implementing real-time event-driven systems using Apache Kafka, and developing and optimizing large-scale batch and streaming data pipelines using Apache Spark. The role requires strong ownership, deep system thinking, and the ability to design for high throughput, low latency, and extreme reliability.

What you'd actually do

Design and build highly scalable backend microservices using Java and Spring Boot.
Architect and implement real-time event-driven systems using Apache Kafka.
Develop and optimize large-scale batch and streaming data pipelines using Apache Spark.
Drive architecture decisions around scalability, resiliency, observability, and cost efficiency.
Lead system design reviews and define engineering best practices for distributed systems.

Skills

Required

12+ years of experience in backend and distributed systems engineering.
Strong hands-on experience in Java and Spring Boot for building production-grade microservices.
Deep expertise in Apache Kafka: Topic design and partitioning, Consumer group scaling and offset management, Delivery semantics (at-least-once / exactly-once), Stream processing patterns and performance tuning.
Strong hands-on experience with Apache Spark: Batch and Structured Streaming workloads, Job optimization (shuffle tuning, memory tuning, skew handling), Working with large-scale datasets.
Proven experience building systems operating at large scale (millions–billions of events / high TPS platforms).
Experience designing event-driven microservices architectures.
Strong understanding of distributed systems fundamentals: Fault tolerance, Back-pressure, Idempotency, Consistency trade-offs.
Experience with cloud-native deployments (Kubernetes, Docker, AWS/GCP/Azure).
Experience with NoSQL / analytical data stores such as Cassandra, BigQuery, HBase, or similar.
Strong production debugging and performance tuning skills.

Nice to have

Direct experience building or deeply customizing platforms like Temporal.io, Cadence, Apache Airflow, or Argo Workflows.
Distributed State Management & Durable Execution.
Deep State Knowledge: Experience managing the state of long-running processes that must survive infrastructure failures, network partitions, and deployments.
Event Sourcing & CQRS: Familiarity with using event-sourcing patterns to rebuild the state of a workflow by replaying history.
Transactions: Understanding of the Saga Pattern for managing distributed transactions and implementing compensations (rollbacks) across microservices.
Fault Tolerance & High Availability.
Idempotency Mastery: Expertise in designing systems where tasks can be retried indefinitely without side effects—a critical requirement for any orchestration engine.
Advanced Retry Policies: Knowledge of jitter, exponential backoff, and circuit breakers to prevent "thundering herd" problems when a downstream service fails.
Rate Limiting & Quotas: Experience building multi-tenant throttling mechanisms to ensure one massive workflow doesn't starve others of resources.
DSL Design: Experience designing Domain-Specific Languages (YAML, JSON, or Python-based) that allow users to define complex logic simply.
SDK Development: Ability to build client-side libraries that abstract away the complexity of the underlying orchestration engine for other developers.
Message Brokers: Professional experience with Kafka, Pulsar, or RabbitMQ specifically used as a task distribution layer.
Priority Queuing: Implementing logic to handle "hot" tasks vs. background tasks efficiently.
Hands-on experience with existing orchestrators such as Temporal.io, Cadence, Apache Airflow, Argo Workflows, or AWS Step Functions.
An understanding of why these tools succeed (or fail) in specific use cases.
Experience in retail, supply chain, pricing, ads, or e-commerce platforms.
Exposure to real-time analytics, recommendation engines, or fraud detection systems.

What the JD emphasized

Java and Spring Boot
Apache Kafka
Apache Spark
high throughput, low latency, and extreme reliability
large scale (millions–billions of events / high TPS platforms)
event-driven microservices architectures
distributed systems fundamentals
cloud-native deployments (Kubernetes, Docker, AWS/GCP/Azure)
NoSQL / analytical data stores

Read full job description

Position Summary...

We are seeking a Staff Software Engineer to design and build enterprise-scale Marketplace platforms supporting Seller Risk. This role is backend-heavy, with strong expectations for Java-based distributed systems, while also providing technical leadership for React-based front-end applications.You will operate as a senior technical leader, driving architecture, system design, and engineering excellence across the full stack.Candidates must be authorized to work in the U.S.No visa sponsorship available now or in the future

What you'll do...

About the Team At Walmart Global Tech, we build highly scalable and reliable backend platforms that power the online marketplace of the world’s largest retail ecosystem. Our systems process massive volumes of real-time and batch data across Walmart marketplace. **This is an onsite role in Sunnyvale, CA, and candidates must have valid U.S. work authorization as visa sponsorship is not available. ** We are looking for a Staff Software Engineer with deep expertise in Java and Spring Boot, strong hands-on experience in Apache Kafka and Apache Spark, and a proven track record of building distributed systems at scale. Role Overview As a Staff Engineer, you will act as a hands-on technical leader and system architect, responsible for designing and delivering large-scale backend platforms and data processing systems. You will work cross-functionally to solve complex engineering challenges, influence platform architecture, and mentor senior engineers. This role requires strong ownership, deep system thinking, and the ability to design for high throughput, low latency, and extreme reliability. Key Responsibilities Design and build highly scalable backend microservices using Java and Spring Boot. Architect and implement real-time event-driven systems using Apache Kafka. Develop and optimize large-scale batch and streaming data pipelines using Apache Spark. Drive architecture decisions around scalability, resiliency, observability, and cost efficiency. Lead system design reviews and define engineering best practices for distributed systems. Work closely with Product, Data Science, Platform, and Infrastructure teams to deliver business impact. Optimize system performance through partitioning strategies, caching, async processing, and concurrency tuning. Mentor engineers and act as a technical multiplier across multiple teams. Participate in production incident reviews and drive long-term platform reliability improvements. **Preferred Skills: ** Orchestration Ecosystem: Direct experience building or deeply customizing platforms like Temporal.io, Cadence, Apache Airflow, or Argo Workflows. Distributed State Management & Durable Execution Deep State Knowledge: Experience managing the state of long-running processes that must survive infrastructure failures, network partitions, and deployments. Event Sourcing & CQRS: Familiarity with using event-sourcing patterns to rebuild the state of a workflow by replaying history. Transactions: Understanding of the Saga Pattern for managing distributed transactions and implementing compensations (rollbacks) across microservices. Fault Tolerance & High Availability Idempotency Mastery: Expertise in designing systems where tasks can be retried indefinitely without side effects—a critical requirement for any orchestration engine. Advanced Retry Policies: Knowledge of jitter, exponential backoff, and circuit breakers to prevent "thundering herd" problems when a downstream service fails. Rate Limiting & Quotas: Experience building multi-tenant throttling mechanisms to ensure one massive workflow doesn't starve others of resources. 3. Developer Experience (DevX) & DSLs DSL Design: Experience designing Domain-Specific Languages (YAML, JSON, or Python-based) that allow users to define complex logic simply. SDK Development: Ability to build client-side libraries that abstract away the complexity of the underlying orchestration engine for other developers. 4. High-Throughput Messaging & Queuing Message Brokers: Professional experience with Kafka, Pulsar, or RabbitMQ specifically used as a task distribution layer. Priority Queuing: Implementing logic to handle "hot" tasks vs. background tasks efficiently. 5. Ecosystem Familiarity Hands-on experience with existing orchestrators such as Temporal.io, Cadence, Apache Airflow, Argo Workflows, or AWS Step Functions. An understanding of why these tools succeed (or fail) in specific use cases. Required Qualifications 12+ years of experience in backend and distributed systems engineering. Must-have strong hands-on experience in Java and Spring Boot for building production-grade microservices. Deep expertise in Apache Kafka: Topic design and partitioning Consumer group scaling and offset management Delivery semantics (at-least-once / exactly-once) Stream processing patterns and performance tuning Strong hands-on experience with Apache Spark: Batch and Structured Streaming workloads Job optimization (shuffle tuning, memory tuning, skew handling) Working with large-scale datasets Proven experience building systems operating at large scale (millions–billions of events / high TPS platforms). Experience designing event-driven microservices architectures. Strong understanding of distributed systems fundamentals: Fault tolerance Back-pressure Idempotency Consistency trade-offs Experience with cloud-native deployments (Kubernetes, Docker, AWS/GCP/Azure). Experience with NoSQL / analytical data stores such as Cassandra, BigQuery, HBase, or similar. Strong production debugging and performance tuning skills. Preferred Qualifications Experience in retail, supply chain, pricing, ads, or e-commerce platforms. Exposure to real-time analytics, recommendation engines, or fraud detection systems. Experience driving cross-team technical initiatives and platform modernization efforts. Familiarity with CI/CD pipelines, observability (metrics/logging/tracing), and infrastructure as code. Experience contributing to internal frameworks or platform engineering efforts. Leadership Expectations Provide technical direction across teams and influence architectural decisions. Raise the engineering bar through mentorship, design rigor, and operational excellence. Balance hands-on coding with strategic technical leadership. Drive initiatives that improve developer productivity and platform scalability. Impact You’ll Make You will build backend platforms that power real-time decisioning, large-scale data processing, and mission-critical retail workflows impacting millions of customers and associates globally. At Walmart, we offer competitive pay as well as performance-based bonus awards and other great benefits for a happier mind, body, and wallet. Health benefits include medical, vision and dental coverage. Financial benefits include 401(k), stock purchase and company-paid life insurance. Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting. Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more. You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable. For information about PTO, see https://one.walmart.com/notices. Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart. Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms. For information about benefits and eligibility, see One.Walmart. The annual salary range for this position is $143,000.00 - $286,000.00 Additional compensation includes annual or quarterly performance bonuses. Additional compensation for certain positions may also include :

Stock

ㅤ

‎

Minimum Qualifications...

__Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications. __

Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 4 years’ experience in software engineering or related area. Option 2: 6 years’ experience in software engineering or related area.

Preferred Qualifications...

Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.

Master’s degree in Computer Science, Computer Engineering, Computer Information Systems, Software Engineering, or related area and 2 years' experience in software engineering or related area, We value candidates with a background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly. The ideal candidate would have knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmart’s accessibility standards and guidelines for supporting an inclusive culture.

Primary Location...

1375 Crossman Ave, Sunnyvale, CA 94089-1114, United States of America

Walmart and its subsidiaries are committed to maintaining a drug-free workplace and has a no tolerance policy regarding the use of illegal drugs and alcohol on the job. This policy applies to all employees and aims to create a safe and productive work environment.