Principal, Software Engineer

Walmart Walmart · Retail · Hoboken, NJ

Principal Software Engineer (ML) to lead the design and development of production-grade AI systems, focusing on E2E pipelines, deployable ML systems with continuous learning, and autonomous AI agents. The role involves building ML services, reinforcement learning loops, computer vision solutions, and integrating models into scalable distributed systems.

What you'd actually do

  1. Design and deploy end-to-end ML pipelines using Python (data ingestion → training → evaluation → deployment → monitoring).
  2. Build production-ready, deployable code for ML services (APIs, batch + real-time inference systems).
  3. Develop and implement reinforcement learning / feedback loop systems (e.g., human-in-the-loop, reward modeling, online learning).
  4. Architect computer vision and image analysis solutions (classification, embeddings, multimodal systems).
  5. Integrate ML models into scalable distributed systems serving millions of users in real time.

Skills

Required

  • software architecture
  • distributed systems
  • scalable design patterns
  • machine learning lifecycle
  • training
  • evaluation
  • deployment
  • monitoring
  • reinforcement learning
  • feedback-driven systems
  • bandits
  • RLHF
  • online learning
  • coding standards
  • multiple programming languages
  • secure software development lifecycle practices
  • requirement analysis
  • risk assessment
  • solution scoping
  • test strategy development
  • automation tools
  • defect management processes
  • technical team leadership
  • mentoring
  • continuous improvement initiatives

Nice to have

  • Python
  • computer vision
  • image analysis
  • classification
  • embeddings
  • multimodal systems
  • LLMs

What the JD emphasized

  • production-grade AI systems
  • deployable ML systems
  • continuous learning and adaptation
  • autonomous AI agents
  • AI agents and multi-step reasoning systems
  • reinforcement learning / feedback loop systems

Other signals

  • production-grade AI systems
  • E2E pipelines
  • deployable ML systems
  • continuous learning and adaptation
  • autonomous AI agents
  • AI agents and multi-step reasoning systems