Senior System Software Engineer, Speech AI

NVIDIA NVIDIA · Semiconductors · Pune, India

Senior System Software Engineer role focused on speech AI technologies (ASR, TTS, ALM, S2S) for enterprise and developer customers. Responsibilities include implementing, troubleshooting, and optimizing GPU-accelerated speech systems in production, transitioning models from research to production, optimizing inference performance, developing core speech services using C++ and Python with CUDA, and contributing to client SDKs. Requires strong programming skills, experience with inference pipelines, understanding of modern model architectures, and knowledge of real-time streaming audio and low-latency systems. Experience with speech model fine-tuning is required.

What you'd actually do

  1. Work on cutting-edge GPU-accelerated AI systems deployed at scale
  2. Tackle challenging problems in real-time streaming audio processing and low-latency inference
  3. Troubleshoot and resolve complex issues across ASR, TTS, ALM, and S2S pipelines
  4. Model Integration: Work alongside Model researchers to transition ASR, TTS and S2S models from research to production readiness.
  5. Develop Core Speech Services: Build and enhance C++ & python backend implementations for ASR, TTS, and S2S pipelines, leveraging CUDA for GPU acceleration

Skills

Required

  • Masters or BE/BTech in Computer Science, computer architecture, or related field
  • 5+ years of experience
  • Excellent C++ & Python programming and software design skills, including debugging, performance analysis, and test design.
  • Experience with inference pipelines for LLM, Speech Recognition & Speech Synthesis
  • Solid understanding of modern model architectures (Transformers, CNNs, RNNs)
  • Excellent debugging abilities spanning multiple software (storage systems, kernels and containers)
  • Experience building and deploying cloud services using HTTP REST, gRPC, Websockets and related technologies
  • Strong collaborative and interpersonal skills, specifically a proven ability to effectively guide and influence within a dynamic matrix environment
  • Ability to work independently, define project goals and scope and manage your own development effort.
  • Knowledge of real-time streaming audio systems and low-latency architectures
  • Experience with speech model fine-tuning or customization

Nice to have

  • Publications or contributions to ML optimization open-source projects.
  • Experience with embedded systems or edge deployment

What the JD emphasized

  • 5+ years of experience
  • Experience with inference pipelines for LLM, Speech Recognition & Speech Synthesis
  • Knowledge of real-time streaming audio systems and low-latency architectures
  • Experience with speech model fine-tuning or customization

Other signals

  • GPU-accelerated AI systems
  • real-time streaming audio processing
  • low-latency inference
  • ASR, TTS, ALM, and S2S pipelines
  • transition ASR, TTS and S2S models from research to production readiness
  • Optimize Inference Performance
  • streaming latency and throughput
  • decoder implementations (CTC, WFST, Flashlight)
  • speech model fine-tuning or customization