Senior Perception Learning Engineer

Apptronik Apptronik · Robotics · Mountain View, CA · Software Engineering

Apptronik is seeking a Senior Perception Learning Engineer to lead research and development of advanced perception systems for their humanoid robots. This role involves designing and optimizing deep learning models for real-time perception tasks, architecting scalable data and inference pipelines, and integrating multi-sensor data for robust autonomy. The position requires a blend of research innovation and practical engineering to deliver deployable, high-performance perception stacks for production robotics platforms.

What you'd actually do

  1. Lead the design, development, and optimization of perception pipelines for humanoid robots, including object detection, tracking, segmentation, pose estimation, and scene understanding.
  2. Develop multi-sensor fusion frameworks that integrate cameras, LiDAR, depth sensors, and IMUs for robust real-time perception in dynamic human-centered environments.
  3. Architect and maintain scalable data pipelines, training infrastructure, and inference frameworks to accelerate model development, evaluation, and deployment.
  4. Drive research and deployment of deep learning models optimized for humanoid locomotion, manipulation, and human-robot interaction.
  5. Implement performance profiling, regression testing, and telemetry systems to ensure perception modules meet strict latency, accuracy, and reliability requirements on edge devices.

Skills

Required

  • MS/PhD in Computer Science, Robotics, Computer Engineering, or related field.
  • 3-5+ years of experience building and deploying perception systems for robotics, autonomous vehicles, or real-time vision applications.
  • Strong background in deep learning for computer vision, with practical expertise in detection, segmentation, multi-object tracking, and 3D perception.
  • Hands-on experience with modern AI frameworks (PyTorch, JAX, TensorFlow) and computer vision / multi-modal libraries such as OpenCV, Detectron2, YOLO, and foundation models for perception and language (e.g., SAM, CLIP, DINOv2, Flamingo)
  • Proficiency in Python and modern C++, with strong software engineering fundamentals (version control, testing, CI/CD).
  • Deep understanding of 3D geometry, camera models, and probabilistic estimation (EKF/UKF, SLAM, VIO).
  • Experience deploying optimized models on edge hardware (GPU/NPU/embedded platforms) under compute, latency, and thermal constraints.

Nice to have

  • Experience with humanoid robots, bipedal locomotion, and manipulation tasks.
  • Strong classical computer vision skills (geometry-based methods, feature extraction) complementing deep learning approaches.
  • Expertise in model acceleration, quantization, or compression (TensorRT, ONNX Runtime).
  • Familiarity with real-time frameworks and middleware such as ROS 2, GStreamer, or zero-copy pipelines.
  • Knowledge of synthetic data generation and domain adaptation techniques for training perception models.
  • Contributions to open-source robotics or vision software stacks.

What the JD emphasized

  • Track record of shipping ML/Perception systems from R&D into production robotics platforms

Other signals

  • humanoid robots
  • perception systems
  • real-time detection, tracking, segmentation, and scene understanding
  • multi-sensor fusion
  • deep learning models for real-time detection, tracking, segmentation, and scene understanding
  • scalable pipelines for training, evaluation, and deployment
  • integrate data from multiple modalities—Cameras, LiDAR, depth sensors, and IMUs—into unified world models
  • research innovation with practical engineering to deliver deployable, high-performance perception stacks
  • ML/Perception systems from R&D into production robotics platforms