Senior Software Engineer Ii, (ml/ai Platform)

Instacart Instacart · Consumer · United States · Remote · Software Engineering

Senior Software Engineer II on the ML/AI Platform team responsible for building and owning the internal platform that supports training (fine-tuning) and deploying (batch inference) AI models across the organization. This includes defining the platform, building SDKs, and supporting the necessary infrastructure. The role requires high ownership and leadership in a complex, distributed systems environment.

What you'd actually do

  1. Excited to build platform-level tools, SDKs
  2. Ability to manage cross-cutting stakeholder relationships, prioritizing customer needs first
  3. Eager to navigate ambiguous, hairy, and technical problem spaces
  4. Eager to jump into domains, languages, and problem areas that might be new and unfamiliar

Skills

Required

  • Bachelor’s degree in Computer Science, a related field, or equivalent practical experience
  • 3 years of experience with software development in one or more programming languages
  • 2 years of experience in designing, analyzing, and troubleshooting large-scale distributed systems
  • 1 years of experience leading projects and providing technical leadership
  • Strong proficiency in maintaining high standards for production services
  • Rapid coding skills and management of production services
  • Experience with high scale throughput and distributed systems problems

Nice to have

  • Strong communication skills and ability to contextualize problems across various audiences
  • Skilled at navigating ambiguity and involving the right stakeholders to address issues efficiently
  • Experience building and shipping products to users
  • Expertise building platforms and high scale infrastructure
  • Visionary thinker capable of generating transformative ideas
  • Prior experience working with AI Platforms like Ray is a plus

What the JD emphasized

  • high ownership
  • leader in the space
  • extreme levels of ownership
  • high levels of collaboration
  • drive specific use cases end to end
  • navigate ambiguous, hairy, and technical problem spaces

Other signals

  • building the internal platform which supports training and deploying AI models
  • define the platform to enable AI model fine-tuning and batch inference
  • building the SDKs and supporting the infra to support these unique workloads
  • high ownership and the ability to be a leader in the space
  • owning the online/offline feature store to the serving and training layer