Staff, Software Engineer

Walmart Walmart · Retail · Sunnyvale, CA

Staff Software Engineer at Walmart focused on the International Recommendations platform, driving innovation in Machine Learning for personalization. Responsibilities include designing and productionizing scalable, low-latency ML services, building and optimizing models and data pipelines, creating monitoring dashboards, and performing latency tuning. The role involves collaborating with data scientists and product teams, and contributing to the ML capabilities roadmap.

What you'd actually do

  1. Design scalable, low-latency services to host models; productionize prototypes on the cloud, including data pipelines, training & inference pipelines, and pre & post-processing routines.
  2. Build, tune, and optimize machine learning models; collaborate with data scientists to refine data models, design improvements, conduct experiments, and iteratively improve results.
  3. Develop and optimize data pipelines to collect, consolidate, and normalize data to feed to machine learning models for offline evaluation and real-time execution.
  4. Create monitoring dashboards; perform latency tuning of deep learning models, scaling solutions to enterprise level; investigate and resolve performance issues.
  5. Run experiments to compare models, features, and hyperparameters; utilize A/B testing and continuous monitoring to validate and adjust models.

Skills

Required

  • Bachelor's Degree in Computer Science or related field and 10 years of experience in software engineering or Master's Degree in Computer Science or related field and 5 years of experience in software engineering
  • 4+ years of proven work experience in designing and Architecting cloud native, distributed, high performant and scalable microservices.
  • 4 years of experience with database languages (SQL, PL/SQL, PG-PL/SQL), version control (Git), data structures and algorithms
  • 4 years of experience in writing production quality software with Python; knowledge of Unit testing in Python, Mocking, Pytest.
  • 4 years of experience in architecting ML solutions given an abstract business problem
  • 3 years of experience with MLOps, Model development lifecycle with knowledge of Training and Deployment pipelines for Machine Learning solutions on the cloud
  • 3 years of experience building training and inference pipelines
  • Hands on experience with building data pipeline with Spark, Hadoop, Redshift and/or Hive
  • Experience with Kubernetes, Helm, microservice architecture and design, and best practices around multithreading, networking, offline storage, and performance tuning
  • A disciplined approach to development, code review, testing, documentation, and code structure in a team environment.
  • Design/Architecting Applications in No-SQL database such Cassandra, and Azure Cosmos DB.
  • Utilize Industry Research and Innovation to build Next Generation systems to Improve Walmart Technology Environment.

Nice to have

  • Experience with GCP and Airflow is a plus.
  • Prefer experience with building Machine Learning models and pipelines related to personalization.

What the JD emphasized

  • 4+ years of proven work experience in designing and Architecting cloud native, distributed, high performant and scalable microservices.
  • 4 years of experience in architecting ML solutions given an abstract business problem
  • 3 years of experience with MLOps, Model development lifecycle with knowledge of Training and Deployment pipelines for Machine Learning solutions on the cloud
  • 3 years of experience building training and inference pipelines

Other signals

  • productionize prototypes on the cloud
  • design scalable, low-latency services to host models
  • build, tune, and optimize machine learning models
  • develop and optimize data pipelines
  • create monitoring dashboards
  • perform latency tuning of deep learning models
  • scaling solutions to enterprise level
  • investigate and resolve performance issues
  • run experiments to compare models, features, and hyperparameters
  • utilize A/B testing and continuous monitoring to validate and adjust models
  • architecting ML solutions given an abstract business problem
  • MLOps, Model development lifecycle with knowledge of Training and Deployment pipelines for Machine Learning solutions on the cloud
  • building training and inference pipelines