Member of Technical Staff, Multimodal Agents, Agi Autonomy

Amazon Amazon · Big Tech · San Francisco, CA · Software Development

Principal Engineer role in Amazon AGI Lab focused on building multimodal agents and the systems to run them reliably at scale. This role involves taking models from prototype to production, setting technical direction, and partnering with researchers to scale emerging VLM and agent ideas. It requires end-to-end ownership, from agent runtime to data management and value delivery.

What you'd actually do

  1. Set the technical direction for the team
  2. Partner closely with researchers to take emerging VLM and agent ideas from prototype to robust, instrumented systems that can be evaluated, improved, and scaled
  3. Create tooling that accelerates research and engineering velocity
  4. Raise the engineering bar for the team through technical design reviews, mentoring, principled architecture, high-quality code, observability, and operational excellence
  5. Influence the broader AGI organization by identifying reusable primitives, writing clear technical strategy, and creating systems that other teams can build on

Skills

Required

  • Python
  • C++
  • Rust
  • Go
  • large-scale software systems
  • ML systems
  • data platforms
  • agent infrastructure
  • low-latency distributed systems
  • deep learning
  • machine learning
  • computer vision
  • multimodal models
  • information retrieval
  • production ML infrastructure
  • leading ambiguous, cross-functional technical projects
  • mentoring senior engineers
  • influencing technical direction
  • written and verbal communication skills
  • clarify ambiguous research problems
  • align stakeholders
  • drive technical decisions
  • high judgment
  • ownership

Nice to have

  • video understanding
  • vision-language models
  • ML engineering for production systems
  • model serving
  • distributed training
  • fine-tuning
  • data pipelines
  • evals
  • observability
  • search
  • ranking
  • embeddings
  • vector databases
  • ANN retrieval
  • metadata generation
  • large-scale multimodal indexing
  • privacy-aware ML systems
  • secure ML systems
  • on-device ML systems
  • edge ML systems
  • client-side ML systems
  • Kubernetes
  • Ray
  • Spark
  • Kafka
  • GPU clusters
  • distributed storage
  • service orchestration
  • high-throughput data processing
  • taking early research ideas and turning them into reliable systems

What the JD emphasized

  • end-to-end ownership
  • take models from prototype to production
  • build the systems that make them run reliably at scale
  • emerging VLM and agent ideas
  • evaluated, improved, and scaled
  • technical direction
  • high-quality code
  • operational excellence
  • reusable primitives
  • clear technical strategy
  • systems that other teams can build on
  • research taste
  • systems thinking
  • product intuition
  • engineering discipline

Other signals

  • building agents that can perceive, reason, and take action
  • take models from prototype to production
  • build the systems that make them run reliably at scale
  • end-to-end ownership