(usa) Principal, Software Engineer

Walmart · Retail · Bentonville, AR +2

Principal Software Engineer for the Colony Platform, an agentic AI framework at Walmart. This role focuses on designing, building, and operating core agent orchestration components and AI-powered services to enable associates to quickly and safely build AI-based solutions. It involves deep expertise in distributed systems, platform engineering, and AI-enabled architectures, with a strong emphasis on prototyping and productionizing GenAI capabilities, implementing security and compliance guardrails, and driving engineering excellence.

What you'd actually do

  1. Define and evolve reference architectures for distributed systems, AI pipelines, and platform services.
  2. Lead development of AI-powered services, agent workflows, and internal builder platforms.
  3. Design, build, and operate core agent orchestration components (UI → agent core logic → tool manager → local tools).
  4. Build robust tool-call validation and execution (schema enforcement, parameter validation, retries, error handling, idempotency, and safe defaults).
  5. Implement security and compliance guardrails for local execution (least privilege, secrets handling, auditing, allowlists/deny lists where appropriate).

Skills

Required

  • 12+ years of experience building highly available, distributed systems.
  • Proven track record delivering complex, enterprise-scale software systems from inception to production.
  • Strong proficiency in Python (building libraries/services/tools), including packaging/dependencies, logging, and performance troubleshooting.
  • Working knowledge of OAuth2/OIDC authentication and scope/permission models.
  • Familiarity with schema/contract frameworks (JSON Schema, OpenAPI, Pydantic, protobuf) and backward-compatible tool evolution.
  • Experience with observability: structured logging, metrics, traces, and debugging distributed flows across client + gateway.
  • Experience working with AI/ML ecosystems in production environments.
  • Strong architectural judgment and ability to evaluate

Nice to have

  • Camunda Zeebe engine
  • Microsoft Graph API integration

What the JD emphasized

  • AI-enabled architectures
  • agentic AI framework
  • orchestrate complex, AI-driven workflows
  • prototype and productionize advanced AI-enabled capabilities
  • responsible AI patterns
  • guardrails
  • human-in-the-loop design
  • core agent orchestration components
  • tool-call validation and execution
  • security and compliance guardrails
  • AI/ML ecosystems in production environments

Other signals

  • AI-enabled capabilities
  • agentic AI framework
  • orchestrate complex, AI-driven workflows
  • prototype and productionize GenAI-enabled capabilities
  • responsible AI patterns
  • core agent orchestration components
  • tool-call validation and execution