ML Engineer, Fauna

Amazon Amazon · Big Tech · NY +1 · Software Development

Machine Learning Engineer to train, evaluate, and deploy models for robots, focusing on reinforcement learning, computer vision, and supervised learning for embodied systems. Responsibilities include training policies, debugging convergence, running experiments, optimizing models for edge hardware, and building MLOps infrastructure.

What you'd actually do

  1. Train and iterate on neural network policies for locomotion, manipulation, navigation, and perception using reinforcement and supervised learning
  2. Design and run experiments in simulation (Isaac Lab, MuJoCo, or similar) and transfer results to physical hardware
  3. Debug training runs end-to-end: diagnosing convergence failures, reward shaping issues, data quality problems, and sim-to-real gaps
  4. Optimize models for deployment on edge hardware (NVIDIA Jetson) with strict latency and memory constraints
  5. Build and maintain MLOps infrastructure: experiment tracking, model versioning, evaluation pipelines, and reproducible training workflows

Skills

Required

  • reinforcement learning
  • computer vision
  • supervised learning
  • robotics
  • embodied systems
  • training infrastructure
  • GPU clusters
  • distributed training
  • edge devices
  • MLOps infrastructure
  • experiment tracking
  • model versioning
  • evaluation pipelines
  • reproducible training workflows
  • C++
  • Python
  • Object Oriented Design

Nice to have

  • Isaac Lab
  • MuJoCo
  • NVIDIA Jetson

What the JD emphasized

  • train policies
  • debug convergence
  • run experiments in simulation
  • push models onto hardware
  • reinforcement learning
  • computer vision
  • supervised learning
  • robotics
  • embodied systems
  • training infrastructure
  • GPU clusters
  • distributed training
  • edge devices
  • strict latency and memory constraints

Other signals

  • train policies
  • debug convergence
  • run experiments in simulation
  • push models onto hardware
  • reinforcement learning
  • computer vision
  • supervised learning
  • robotics
  • embodied systems
  • training infrastructure
  • GPU clusters
  • distributed training
  • edge devices