Engineering Manager, Machine Learning (caper)

Instacart Instacart · Consumer · United States · Leadership (Engineering)

Engineering Manager, Machine Learning to lead a team of ML and AI infrastructure engineers building perception, multimodal understanding, and edge inference systems for AI-powered shopping carts. The role involves defining the technical vision, architecting training and inference platforms, delivering production-grade CV/VLM models, and optimizing on-device inference for real-time edge operation. This is a leadership role focused on the 'brain' behind the cart, bridging edge devices with cloud systems.

What you'd actually do

  1. Lead and grow a team of ~10 ML and AI infrastructure engineers building the perception and reasoning systems that power Caper Carts in live retail environments.
  2. Define the technical vision, roadmap, and success metrics for cart perception and multimodal understanding; prioritize work that drives measurable gains in item recognition accuracy, checkout speed, and system reliability.
  3. Architect scalable training, data, and inference platforms on GCP using Ray, Kubernetes, and modern MLOps practices to enable rapid experimentation and safe, repeatable deployments.
  4. Deliver production-grade CV/VLM models for multi-camera item detection, weighing, and basket reasoning; optimize on-device inference for low-latency, high-availability operation at the edge.
  5. Build the data flywheel end-to-end—instrumentation, labeling, evaluation, offline/online testing, and monitoring—to continuously improve performance across diverse store conditions.

Skills

Required

  • Machine learning systems
  • Computer vision
  • Team management
  • ML/AI engineers
  • Deep learning
  • PyTorch
  • Model training
  • Model evaluation
  • MLOps
  • CI/CD
  • ML services
  • ML infrastructure
  • GCP
  • GKE
  • Vertex AI
  • BigQuery
  • Ray
  • Kubernetes
  • Docker
  • Edge inference
  • Model optimization
  • TensorRT
  • ONNX
  • Quantization
  • Python
  • SQL
  • CV systems
  • Data pipelines
  • Experimentation
  • Deployment
  • Post-launch iteration

Nice to have

  • Android applications
  • Multimodal vision-language models (VLMs)
  • Large language models (LLMs)
  • Sensors
  • Hardware integration
  • Robotics
  • Retail environments
  • Cross-functional programs
  • Graduate degree (MS/PhD)

What the JD emphasized

  • 8+ years of experience building and deploying machine learning systems, with a strong focus on computer vision in production environments.
  • 2+ years of experience managing teams of 6+ ML/AI engineers, including hiring, performance management, and career development.
  • Proven experience architecting and operating ML infrastructure on GCP (e.g., GKE, Vertex AI, BigQuery) and distributed training/inference with Ray; containerization with Docker and orchestration with Kubernetes.
  • Experience delivering real-time edge inference, including model optimization (e.g., TensorRT, ONNX, quantization) and monitoring for latency, throughput, and accuracy.

Other signals

  • AI-powered shopping carts
  • perception, multimodal understanding, and edge inference
  • physical AI
  • item recognition accuracy, latency, and reliability
  • data flywheel end-to-end