Senior, Software Engineer - AI Systems

Walmart Walmart · Retail · Bellevue, WA +1

Senior Software Engineer focused on building AI-first systems, specifically agentic AI services and high-performance data/compute frameworks. The role involves integrating LLMs/agents, orchestrating pipelines with Ray, accelerating workloads with RAPIDS, and delivering scalable APIs and services. Requires strong software engineering fundamentals with practical ML systems exposure, focusing on performance, reliability, and developer experience.

What you'd actually do

  1. Build agentic AI services (planning, tool use, retrieval, feedback loops) and integrate them with internal systems and APIs.
  2. Implement orchestration, memory, tooling, evaluation, and guardrails for agentic workflows.
  3. Develop GPU‑accelerated pipelines using RAPIDS (cuDF/cuML/cuGraph) and optimize end‑to‑end performance.
  4. Use Ray (or similar) for distributed compute, batch/stream processing, and scalable workflow orchestration.
  5. Design and maintain reliable microservices for training/inference, vector indexing, and real-time decisioning.

Skills

Required

  • Python
  • Ray
  • RAPIDS
  • FastAPI
  • Flask
  • Kubernetes
  • Docker
  • data structures
  • algorithms
  • concurrency
  • networking
  • systems design
  • backend services
  • platform services

Nice to have

  • agent frameworks
  • LangGraph
  • tool-use patterns
  • retrieval and memory components
  • vector databases
  • FAISS
  • Milvus
  • pgvector
  • Pinecone
  • feature stores
  • LLM services
  • embedding services
  • prompt patterns
  • tooling patterns
  • evaluation harnesses
  • Kubernetes autoscaling
  • HPA
  • KEDA
  • GPU scheduling
  • PyTorch profiler
  • Nsight
  • line-profiler
  • Ray dashboard
  • vLLM
  • Triton Inference Server
  • ONNX Runtime
  • TensorRT
  • Go
  • Java
  • C++

What the JD emphasized

  • agentic AI services
  • orchestration
  • tool use
  • evaluation
  • guardrails
  • GPU‑accelerated pipelines
  • RAPIDS
  • Ray
  • reliable microservices
  • training/inference
  • vector indexing
  • real-time decisioning
  • feature stores
  • vector databases
  • artifact registries
  • model catalogs
  • security, privacy, and compliance
  • Python
  • Ray, Spark, or Dask
  • RAPIDS (cuDF/cuML/cuGraph)
  • FastAPI/Flask (Python), K8s (Kubernetes)
  • agent frameworks
  • vector databases
  • LLM and embedding services

Other signals

  • agentic AI services
  • productionize models
  • GPU-accelerated pipelines
  • scalable workflow orchestration
  • reliable microservices for training/inference