Senior System Software Engineer, Speech AI

NVIDIA NVIDIA · Semiconductors · Pune, India

NVIDIA is seeking an experienced Software Engineer to work on their GPU-accelerated Speech AI platform, focusing on building and optimizing core speech recognition (ASR), text-to-speech (TTS), and S2S services for real-time conversational AI applications. The role involves developing C++ & Python backend implementations, optimizing inference performance, adding new features, contributing to client libraries, and performance analysis of complex systems.

What you'd actually do

  1. Develop Core Speech Services: Build and enhance C++ & python backend implementations for ASR, TTS, and S2S pipelines, leveraging CUDA for GPU acceleration
  2. Optimize Inference Performance: Improve streaming latency and throughput through advanced batching strategies, encoder caching, and multi-threaded pipeline optimizations
  3. Feature Development: Add new capabilities such as advanced voice activity detection, speaker diarization, decoder implementations (CTC, WFST, Flashlight), and text post-processing
  4. Client Libraries: Contribute to Python and C++ client SDKs and CLI tools for easy service integration
  5. Performance Analysis: Profile and debug complex multi-threaded systems, identifying bottlenecks in GPU/CPU pipelines

Skills

Required

  • Masters or BE/BTech in Computer Science, computer architecture, or related field
  • 4+ years of experience
  • Excellent C++ & Python programming and software design skills
  • Debugging
  • Performance analysis
  • Test design
  • Experience with inference pipelines for LLM, Speech Recognition & Speech Synthesis
  • Systems Programming
  • Multi-threading
  • Synchronization primitives
  • Thread pools
  • Concurrent data structures
  • Bazel or similar build systems
  • Track record of identifying and resolving performance bottlenecks in latency-sensitive systems
  • Experience building and deploying cloud services using HTTP REST, gRPC, Websockets and related technologies

Nice to have

  • Publications or contributions to ML optimization open-source projects
  • Experience with embedded systems or edge deployment
  • CUDA

What the JD emphasized

  • 4+ years of experience
  • Excellent C++ & Python programming and software design skills, including debugging, performance analysis, and test design.
  • Experience with inference pipelines for LLM, Speech Recognition & Speech Synthesis
  • Systems Programming: Deep knowledge of multi-threading, synchronization primitives, thread pools, and concurrent data structures
  • Performance Optimization: Track record of identifying and resolving performance bottlenecks in latency-sensitive systems

Other signals

  • GPU-accelerated Speech AI platform
  • high-performance
  • real-time conversational AI applications at scale