Software Engineer, Metropolis Vision AI

NVIDIA · Semiconductors · Ho Chi Minh City, Vietnam +1

Software Engineer for NVIDIA's Metropolis Vision AI team, focusing on building and optimizing large-scale distributed Vision AI platforms for real-time and streaming scenarios. The role involves implementing high-performance pipelines, developing distributed services for video/image/3D data processing, enhancing multi-modal perception, using simulation/synthetic data, and profiling GPU-accelerated inference. Requires strong C++/Python, Linux, computer vision, deep learning, and distributed systems experience, with practical experience in PyTorch for training and deployment.

What you'd actually do

Implementing high-performance Metropolis Vision AI pipelines for real-time and streaming scenarios using computer vision and deep learning models.
Developing large-scale distributed services responsible for processing video, image, and 3D data in both edge and cloud settings.
Assisting to multi-modal perception capabilities that combine 2D, 3D, and temporal information to understand complex real-world scenes.
Using simulation and synthetic data tools to build, test, and validate perception algorithms at scale.
Profiling GPU-accelerated inference pipelines to meet strict latency, efficiency, and reliability targets.

Skills

Required

BS or MS in Computer Science, Electrical Engineering, or a related field, or equivalent experience.
2+ years of professional software development experience using modern C++ (14/17/20) and Python on Linux.
Strong computer science fundamentals, including algorithms, data structures, concurrency, and distributed systems concepts.
Experience in computer vision and deep learning.
Experience in implementing concurrent systems, including multi-threading, asynchronous I/O, and efficient memory management.
Experience in Linux-based environments with containers and microservices, integrating AI components into scalable back-end services.
Practical experience with PyTorch in training, fine-tuning, and deploying models for vision tasks.
Strong analytical and problem-solving skills, with a data-driven approach to performance optimization and system build.
Excellent written and verbal English communication skills, with demonstrated success collaborating across time zones and functions.

Nice to have

Practical experience implementing end-to-end computer vision applications in production, such as video analytics, smart cities, autonomous systems, retail analytics, industrial inspection, or digital twins.
Practical experience with low-level optimization for inference and pre/post-processing.
Experience in simulation and synthetic data creation employing tools such as Omniverse, Unreal Engine, Unity, or similar digital-twin platforms.
Background in multimedia, including video-centric processing and delivery (such as codecs, video pipelines, or media frameworks) and integrating vision models into multimedia workflows.

What the JD emphasized

high-performance
large-scale distributed
real-time
streaming scenarios
video, image, and 3D data
multi-modal perception
simulation and synthetic data
GPU-accelerated inference pipelines
strict latency, efficiency, and reliability targets
modern C++ (14/17/20)
Linux
computer vision and deep learning
concurrent systems
multi-threading, asynchronous I/O, and efficient memory management
Linux-based environments with containers and microservices
PyTorch in training, fine-tuning, and deploying models for vision tasks
performance optimization
end-to-end computer vision applications in production
low-level optimization for inference and pre/post-processing
simulation and synthetic data creation
multimedia, including video-centric processing and delivery

Other signals

large-scale distributed Vision AI platforms
high-performance vision systems
turn massive streams of video, image, and 3D data into actionable insights
bring research into production at scale

Read full job description

NVIDIA's technology is at the heart of the AI revolution, touching people across the planet by powering everything from self-driving cars, robotics, co-pilots, and more. Join us at the forefront of technological advancement in intelligent assistants and information retrieval. Metropolis is transforming how the physical world is perceived and understood using advanced computer vision and deep learning. Our team builds large-scale distributed Vision AI platforms that power intelligent spaces, smart cities, retail analytics, and digital twins. This role offers the opportunity to contribute to core components of a strategic platform with high visibility and real-world impact. As a System Software Engineer for Vision AI, you will develop and optimize high-performance vision systems that turn massive streams of video, image, and 3D data into actionable insights. You will collaborate with specialists in perception, simulation, and large models to bring research into production at scale.

What you’ll be doing:

Implementing high-performance Metropolis Vision AI pipelines for real-time and streaming scenarios using computer vision and deep learning models.
Developing large-scale distributed services responsible for processing video, image, and 3D data in both edge and cloud settings.
Assisting to multi-modal perception capabilities that combine 2D, 3D, and temporal information to understand complex real-world scenes.
Using simulation and synthetic data tools to build, test, and validate perception algorithms at scale.
Profiling GPU-accelerated inference pipelines to meet strict latency, efficiency, and reliability targets.
Collaborating with partner teams to implement technical builds.
Participating in technical reviews and contributing to guidelines for code quality and testing.

What we need to see:

BS or MS in Computer Science, Electrical Engineering, or a related field, or equivalent experience.
2+ years of professional software development experience using modern C++ (14/17/20) and Python on Linux.
Strong computer science fundamentals, including algorithms, data structures, concurrency, and distributed systems concepts.
Experience in computer vision and deep learning.
Experience in implementing concurrent systems, including multi-threading, asynchronous I/O, and efficient memory management.
Experience in Linux-based environments with containers and microservices, integrating AI components into scalable back-end services..
Practical experience with PyTorch in training, fine-tuning, and deploying models for vision tasks.
Strong analytical and problem-solving skills, with a data-driven approach to performance optimization and system build.
Excellent written and verbal English communication skills, with demonstrated success collaborating across time zones and functions.

Ways to stand out from the crowd:

Practical experience implementing end-to-end computer vision applications in production, such as video analytics, smart cities, autonomous systems, retail analytics, industrial inspection, or digital twins.
Practical experience with low-level optimization for inference and pre/post-processing.
Experience in simulation and synthetic data creation employing tools such as Omniverse, Unreal Engine, Unity, or similar digital-twin platforms..
Background in multimedia, including video-centric processing and delivery (such as codecs, video pipelines, or media frameworks) and integrating vision models into multimedia workflows.

What you’ll be doing:

Implementing high-performance Metropolis Vision AI pipelines for real-time and streaming scenarios using computer vision and deep learning models.
Developing large-scale distributed services responsible for processing video, image, and 3D data in both edge and cloud settings.
Assisting to multi-modal perception capabilities that combine 2D, 3D, and temporal information to understand complex real-world scenes.
Using simulation and synthetic data tools to build, test, and validate perception algorithms at scale.
Profiling GPU-accelerated inference pipelines to meet strict latency, efficiency, and reliability targets.
Collaborating with partner teams to implement technical builds.
Participating in technical reviews and contributing to guidelines for code quality and testing.

What we need to see:

BS or MS in Computer Science, Electrical Engineering, or a related field, or equivalent experience.
2+ years of professional software development experience using modern C++ (14/17/20) and Python on Linux.
Strong computer science fundamentals, including algorithms, data structures, concurrency, and distributed systems concepts.
Experience in computer vision and deep learning.
Experience in implementing concurrent systems, including multi-threading, asynchronous I/O, and efficient memory management.
Experience in Linux-based environments with containers and microservices, integrating AI components into scalable back-end services..
Practical experience with PyTorch in training, fine-tuning, and deploying models for vision tasks.
Strong analytical and problem-solving skills, with a data-driven approach to performance optimization and system build.
Excellent written and verbal English communication skills, with demonstrated success collaborating across time zones and functions.

Ways to stand out from the crowd:

Practical experience implementing end-to-end computer vision applications in production, such as video analytics, smart cities, autonomous systems, retail analytics, industrial inspection, or digital twins.
Practical experience with low-level optimization for inference and pre/post-processing.
Experience in simulation and synthetic data creation employing tools such as Omniverse, Unreal Engine, Unity, or similar digital-twin platforms..
Background in multimedia, including video-centric processing and delivery (such as codecs, video pipelines, or media frameworks) and integrating vision models into multimedia workflows.