Senior Software Engineer - Vlm Microservices for Neural Reconstruction

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1

Senior Software Engineer to design, build, and optimize containerized inference execution for 3D Vision Language Models (VLMs) for neural reconstruction, turning research into production-grade software (NIMs). The role involves developing benchmarks, releasing and maintaining models, contributing to open-source projects like vLLM, and collaborating with research and product teams. Requires experience with AI distributed systems, inference platforms, Python/C++, and software engineering fundamentals.

What you'd actually do

  1. Design, build, and optimize containerized inference execution for the latest 3D VLMs from NVIDIA, turning research work into production-grade, highly optimized software (NIMs, NVIDIA Inference Microservices)
  2. Develop benchmarks to validate the models accuracy and performance (latency, throughput, scalability)
  3. Release and maintain the models and their pipelines throughout their lifecycle (bug fixes, security patches)
  4. Contribute VLM-related features to Open-Source projects like vLLM
  5. Collaborate closely with Research and Product teams and influence our common roadmaps

Skills

Required

  • Master's of Science in Computer Science + 3 years, or Electrical Engineering, Bachelor of Science (or equivalent experience) + 5 years of experience
  • History of building, validating and releasing production-grade AI distributed systems, backend services, microservices, and cloud technologies
  • Deep technical expertise in distributed applications using Docker, Kubernetes, endpoints and their APIs (REST, gRPC), Helm
  • Hands-on experience with modern inference platforms (vLLM, SGLang, Torch, TRT, TRT-LLM)
  • Proficiency with Python and C++
  • Excellent software engineering fundamentals (source control, CI/CD, testing/validation, packaging, containerization)
  • Excellent written, visual, and verbal communication

Nice to have

  • Track record contributing to open-source or production-grade software
  • Experience with ML model engineering: training, fine-tuning, distillation, quantization
  • Experience with low-level optimization of ML models (CUDA kernels)
  • Strong fundamentals in 3D graphics, 3D computer vision or neural reconstruction (NERFs, Gaussian Splats)

What the JD emphasized

  • production-grade AI distributed systems
  • modern inference platforms
  • low-level optimization of ML models

Other signals

  • shipping AI models
  • inference optimization
  • production-grade AI systems