What you'd actually do

Lead the design, development, and optimization of perception pipelines for humanoid robots, including object detection, tracking, segmentation, pose estimation, and scene understanding.

Develop multi-sensor fusion frameworks that integrate cameras, LiDAR, depth sensors, and IMUs for robust real-time perception in dynamic human-centered environments.

Architect and maintain scalable data pipelines, training infrastructure, and inference frameworks to accelerate model development, evaluation, and deployment.

Drive research and deployment of deep learning models optimized for humanoid locomotion, manipulation, and human-robot interaction.

Implement performance profiling, regression testing, and telemetry systems to ensure perception modules meet strict latency, accuracy, and reliability requirements on edge devices.

Skills

Required

MS/PhD in Computer Science, Robotics, Computer Engineering, or related field.
3-5+ years of experience building and deploying perception systems for robotics, autonomous vehicles, or real-time vision applications.
Strong background in deep learning for computer vision, with practical expertise in detection, segmentation, multi-object tracking, and 3D perception.
Hands-on experience with modern AI frameworks (PyTorch, JAX, TensorFlow) and computer vision / multi-modal libraries such as OpenCV, Detectron2, YOLO, and foundation models for perception and language (e.g., SAM, CLIP, DINOv2, Flamingo)
Proficiency in Python and modern C++, with strong software engineering fundamentals (version control, testing, CI/CD).
Deep understanding of 3D geometry, camera models, and probabilistic estimation (EKF/UKF, SLAM, VIO).
Experience deploying optimized models on edge hardware (GPU/NPU/embedded platforms) under compute, latency, and thermal constraints.

Nice to have

Experience with humanoid robots, bipedal locomotion, and manipulation tasks.
Strong classical computer vision skills (geometry-based methods, feature extraction) complementing deep learning approaches.
Expertise in model acceleration, quantization, or compression (TensorRT, ONNX Runtime).
Familiarity with real-time frameworks and middleware such as ROS 2, GStreamer, or zero-copy pipelines.
Knowledge of synthetic data generation and domain adaptation techniques for training perception models.
Contributions to open-source robotics or vision software stacks.

Other signals

humanoid robots

perception systems

real-time detection, tracking, segmentation, and scene understanding

multi-sensor fusion

deep learning models for real-time detection, tracking, segmentation, and scene understanding

scalable pipelines for training, evaluation, and deployment

integrate data from multiple modalities—Cameras, LiDAR, depth sensors, and IMUs—into unified world models

research innovation with practical engineering to deliver deployable, high-performance perception stacks

ML/Perception systems from R&D into production robotics platforms

Apptronik is a human-centered robotics company developing AI-powered robots to support humanity in every facet of life. Our flagship humanoid robot, Apollo, is built to collaborate thoughtfully with people, starting with critical industries such as manufacturing and logistics, with future applications in healthcare, the home, and beyond.

We operate at the cutting edge of embodied AI, applying our expertise across the full robotics stack to solve some of society's most important problems. You will join a team dedicated to bringing Apollo to market at scale, tackling the complex challenges like safety, commercialization, and mass production to change the world for the better.

JOB SUMMARY

As a Senior Perception Learning Engineer, you will lead research and development of advanced perception systems that empower Apptronik’s humanoid robots to understand and interact with complex human environments. Your work will focus on cutting-edge research in perception, SLAM, object detection, world modeling, and multi-sensor fusion, creating the foundation for robust autonomy in real-world settings.

You will design and optimize deep learning models for real-time detection, tracking, segmentation, and scene understanding while architecting scalable pipelines for training, evaluation, and deployment. You will also integrate data from multiple modalities—Cameras, LiDAR, depth sensors, and IMUs—into unified world models that support navigation, manipulation, safety and human-robot interaction.

This role requires balancing research innovation with practical engineering to deliver deployable, high-performance perception stacks. You will collaborate across Reinforcement learning teams, Platform software team and systems teams, mentor junior engineers, and contribute to shaping Apptronik’s long-term perception and autonomy roadmap. Your work will directly accelerate the development of humanoid robots that can safely operate in human spaces, adapt to dynamic environments, and extend human capability.

ESSENTIAL DUTIES AND RESPONSIBILITIES or KEY ACCOUNTABILITIES

Lead the design, development, and optimization of perception pipelines for humanoid robots, including object detection, tracking, segmentation, pose estimation, and scene understanding.
Develop multi-sensor fusion frameworks that integrate cameras, LiDAR, depth sensors, and IMUs for robust real-time perception in dynamic human-centered environments.
Architect and maintain scalable data pipelines, training infrastructure, and inference frameworks to accelerate model development, evaluation, and deployment.
Drive research and deployment of deep learning models optimized for humanoid locomotion, manipulation, and human-robot interaction.
Implement performance profiling, regression testing, and telemetry systems to ensure perception modules meet strict latency, accuracy, and reliability requirements on edge devices.
Collaborate with planning, control, and hardware teams to define perception-to-action interfaces, ensuring real-time compatibility with locomotion and manipulation pipelines.
Guide the integration of synthetic data (e.g., simulation frameworks like IsaacSim) with real-world datasets to enhance model generalization and robustness.
Mentor junior engineers and contribute to best practices in code quality, model versioning, reproducibility, and deployment.

EDUCATION and/or EXPERIENCE

MS/PhD in Computer Science, Robotics, Computer Engineering, or related field.
3-5+ years of experience building and deploying perception systems for robotics, autonomous vehicles, or real-time vision applications.
Strong background in deep learning for computer vision, with practical expertise in detection, segmentation, multi-object tracking, and 3D perception.
Hands-on experience with modern AI frameworks (PyTorch, JAX, TensorFlow) and computer vision / multi-modal libraries such as OpenCV, Detectron2, YOLO, and foundation models for perception and language (e.g., SAM, CLIP, DINOv2, Flamingo)
Proficiency in Python and modern C++, with strong software engineering fundamentals (version control, testing, CI/CD).
Deep understanding of 3D geometry, camera models, and probabilistic estimation (EKF/UKF, SLAM, VIO).
Experience deploying optimized models on edge hardware (GPU/NPU/embedded platforms) under compute, latency, and thermal constraints.
Track record of shipping ML/Perception systems from R&D into production robotics platforms.

Preferred Qualifications

Experience with humanoid robots, bipedal locomotion, and manipulation tasks.
Strong classical computer vision skills (geometry-based methods, feature extraction) complementing deep learning approaches.
Expertise in model acceleration, quantization, or compression (TensorRT, ONNX Runtime).
Familiarity with real-time frameworks and middleware such as ROS 2, GStreamer, or zero-copy pipelines.
Knowledge of synthetic data generation and domain adaptation techniques for training perception models.
Contributions to open-source robotics or vision software stacks.

PHYSICAL REQUIREMENTS

Prolonged periods of sitting at a desk and working on a computer
Must be able to lift 15 pounds at times
Vision to read printed materials and a computer screen
Hearing and speech to communicate

The annual salary range is $190,000 - $235,000

*This is a direct hire. Please, no outside Agency solicitations.

Apptronik provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

JOB SUMMARY

ESSENTIAL DUTIES AND RESPONSIBILITIES or KEY ACCOUNTABILITIES

Lead the design, development, and optimization of perception pipelines for humanoid robots, including object detection, tracking, segmentation, pose estimation, and scene understanding.
Develop multi-sensor fusion frameworks that integrate cameras, LiDAR, depth sensors, and IMUs for robust real-time perception in dynamic human-centered environments.
Architect and maintain scalable data pipelines, training infrastructure, and inference frameworks to accelerate model development, evaluation, and deployment.
Drive research and deployment of deep learning models optimized for humanoid locomotion, manipulation, and human-robot interaction.
Implement performance profiling, regression testing, and telemetry systems to ensure perception modules meet strict latency, accuracy, and reliability requirements on edge devices.
Collaborate with planning, control, and hardware teams to define perception-to-action interfaces, ensuring real-time compatibility with locomotion and manipulation pipelines.
Guide the integration of synthetic data (e.g., simulation frameworks like IsaacSim) with real-world datasets to enhance model generalization and robustness.
Mentor junior engineers and contribute to best practices in code quality, model versioning, reproducibility, and deployment.

EDUCATION and/or EXPERIENCE

MS/PhD in Computer Science, Robotics, Computer Engineering, or related field.
3-5+ years of experience building and deploying perception systems for robotics, autonomous vehicles, or real-time vision applications.
Strong background in deep learning for computer vision, with practical expertise in detection, segmentation, multi-object tracking, and 3D perception.
Hands-on experience with modern AI frameworks (PyTorch, JAX, TensorFlow) and computer vision / multi-modal libraries such as OpenCV, Detectron2, YOLO, and foundation models for perception and language (e.g., SAM, CLIP, DINOv2, Flamingo)
Proficiency in Python and modern C++, with strong software engineering fundamentals (version control, testing, CI/CD).
Deep understanding of 3D geometry, camera models, and probabilistic estimation (EKF/UKF, SLAM, VIO).
Experience deploying optimized models on edge hardware (GPU/NPU/embedded platforms) under compute, latency, and thermal constraints.
Track record of shipping ML/Perception systems from R&D into production robotics platforms.

Preferred Qualifications

Experience with humanoid robots, bipedal locomotion, and manipulation tasks.
Strong classical computer vision skills (geometry-based methods, feature extraction) complementing deep learning approaches.
Expertise in model acceleration, quantization, or compression (TensorRT, ONNX Runtime).
Familiarity with real-time frameworks and middleware such as ROS 2, GStreamer, or zero-copy pipelines.
Knowledge of synthetic data generation and domain adaptation techniques for training perception models.
Contributions to open-source robotics or vision software stacks.

PHYSICAL REQUIREMENTS

Prolonged periods of sitting at a desk and working on a computer
Must be able to lift 15 pounds at times
Vision to read printed materials and a computer screen
Hearing and speech to communicate

The annual salary range is $190,000 - $235,000

*This is a direct hire. Please, no outside Agency solicitations.

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

JOB SUMMARY

Preferred Qualifications

JOB SUMMARY

Preferred Qualifications