Senior Systems Software Engineer, Machine Learning

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1

Senior Systems Software Engineer focused on building and shipping machine learning workflows and agentic systems, particularly leveraging LLMs/VLMs and computer vision for data generation and product features. The role involves converting research into production products, defining evaluation criteria, and iterating quickly.

What you'd actually do

  1. Convert research into real products (not just slide decks or notebooks)
  2. Help build workflows that diversify datasets and/or populate data
  3. Ship machine learning workflows/pipelines fast and iterate faster
  4. Leverage LLM/VLM and agents in the data generation pipeline
  5. Define evaluation criteria and run offline evals before any model or prompt change reaches production

Skills

Required

  • Master's, or preferably a PhD degree in Computer Science or a related field (or equivalent experience)
  • 5+ years of experience
  • Solid mathematical and algorithmic foundation and proven expertise demonstrated through research publications, internships, or significant project experience.
  • Strong background in computer vision and deep learning.
  • Excellent programming skills in Python and C/C++.
  • Excellent software engineering fundamentals.
  • Ability to develop code in Unix/Linux environments.
  • Excellent written, visual, and verbal communication skills to present performance challenges, tradeoffs, and architectural alternatives.
  • Strong collaboration skills to partner with other teams.

Nice to have

  • Experience designing and operating multi-agent pipelines in production, including handling non-deterministic failures, retry logic, and tool-call error recovery
  • Shipped a product feature backed by a VLM (e.g., image captioning, document understanding) — including handling inference latency, cost-per-call tradeoffs, and degraded-mode fallbacks
  • Experience with 3D computer vision
  • Experience with generative AI, LLMs/VLMs, computer vision, and agentic systems

What the JD emphasized

  • Ship machine learning workflows/pipelines fast
  • Shipped AI-powered features to real users
  • experience designing and operating multi-agent pipelines in production
  • Shipped a product feature backed by a VLM

Other signals

  • Generative AI, LLMs/VLMs, computer vision, and agentic systems
  • Ship machine learning workflows/pipelines fast
  • Leverage LLM/VLM and agents in the data generation pipeline
  • Define evaluation criteria and run offline evals before any model or prompt change reaches production