About the Team PICO-MR team is dedicated to pioneering core technologies for intelligent human-computer interaction in MR environments, with a focus on integrating multimodal large language models (MLLM) and tool-use capabilities to redefine user experiences. Our R&D directions cover cutting-edge fields including multimodal scene understanding, MLLM-based agent systems, tool-augmented MR interaction, 3D environment perception, and AIGC-driven content generation. Within MR scenarios, our work spans: MLLM optimization and adaptation for MR, intelligent task execution with tool use, multimodal scene understanding (vision, point clouds, text), AIGC-based scene generation, depth estimation (Mono/Stereo/MVS), 3D environment perception, large-scale 3D scene reconstruction (3DGS, NeRF, etc.), visual localization, and lighting estimation—encompassing both fundamental research breakthroughs and industrial-grade solution deployment.
Responsibilities:
- Lead the R&D of multimodal large language models (MLLM) tailored for MR scenarios, integrating vision, point clouds, text, and other multimodal information—including model architecture optimization, cross-modal alignment, data construction, evaluation system enhancement, and end-to-end training/inference acceleration.
- Drive the research and implementation of MLLM tool-use capabilities in MR environments, enabling models to proficiently utilize spatial interaction and spatial computing-related professional tools, support tool calls for both single-turn and multi-turn conversations, and solve complex user tasks through interaction.
- Address key challenges in long-horizon, multi-turn tool-augmented tasks in MR, such as context memory management, tool selection strategy, and error correction mechanisms.
- Keep abreast of cutting-edge technologies in MLLM, multimodal intelligence, and tool-use research, and lead the application and deployment of innovative technologies in PICO's MR products.
- Collaborate with cross-functional teams (including software engineering, product design, and hardware development) to translate research outcomes into practical features that enhance user experience.
Requirements
Minimum Qualifications
- Master's or Ph.D. degree in Computer Science, Electrical Engineering, Machine Learning, Artificial Intelligence, or a related quantitative field.
- Expertise in multimodal large model pre-training, post-training, fine-tuning, or cross-modal fusion technologies, with hands-on experience in model optimization, training workflow design, and performance tuning.
- Proven research experience in LLM tool use, reinforcement learning, LLM agents, or interactive learning, with a deep understanding of single-turn and multi-turn interaction mechanisms.
- Proficiency in core 2D/3D computer vision tasks, including detection, segmentation, depth estimation, image matching, and 3D scene perception.
- Skilled in Python and C++, with solid programming capabilities and experience in developing large-scale models using mainstream deep learning frameworks (PyTorch/TensorFlow).
- Excellent problem-solving and independent research abilities, capable of addressing complex technical challenges in the integration of MR and MLLM tool use.
Preferred Qualifications
- Publications in AI/ML/CV conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ACL, EMNLP) focusing on multimodal large models, LLM tool use, or agent systems.
- Hands-on experience in building large-scale MLLM training pipelines, tool-use evaluation systems, or multimodal agent platforms.
- Familiarity with MR/AR/VR technologies, spatial computing, or 3D scene reconstruction (3DGS, NeRF, etc.) is a strong plus.
- Experience in addressing long-horizon reasoning or asynchronous agent behavior challenges is highly valued.
- Award winners of competitions such as ACM-ICPC, NOI/IOI, TopCoder, or AI/ML contests (e.g., Kaggle) are preferred.
- Strong collaboration and communication skills, able to lead research initiatives and drive cross-team technical alignment.