Staff Machine Learning Systems Engineer, Embeddings Platform

Reddit Reddit · Consumer · United States · Remote · Machine Learning

Staff Machine Learning Systems Engineer focused on building and scaling ML models for Reddit's recommendation systems, with expertise in deep learning architectures, distributed training, and real-time inference. The role involves technical leadership, strategy definition, and mentoring, impacting personalization and content discovery across the platform.

What you'd actually do

  1. Architect and lead the development of next-generation, large-scale machine learning techniques.
  2. Define and execute the ML strategy, identifying opportunities to enhance personalization and recommendation quality across Reddit.
  3. Lead research initiatives on scalable machine learning systems and real-time model adaptation, bringing cutting-edge advancements into production.
  4. Partner with ML infrastructure teams to build high-performance, distributed training systems that efficiently scale across multiple GPUs and cloud environments.
  5. Establish and optimize real-time serving architectures for large-scale embeddings, ensuring low-latency inference and high throughput.

Skills

Required

  • 8+ years of experience in machine learning engineering
  • strong focus on large-scale ML systems and recommendation or personalization systems
  • Expertise in modern deep learning architectures, including sequence models and foundational models
  • Deep understanding of complex multi-entity relationships in machine learning applications and how they are modeled in large-scale systems
  • Proven ability to design, implement, and optimize scalable ML architectures, from distributed training to real-time inference
  • Strong software engineering skills in Python, C++, or similar languages
  • experience in ML infrastructure, high-performance computing, and cloud-based ML pipelines
  • Demonstrated leadership in driving ML strategy, mentoring engineers, and influencing cross-functional teams
  • Experience with A/B testing, model evaluation frameworks, and real-time feedback loops in large-scale production systems
  • Excellent communication skills

What the JD emphasized

  • large-scale machine learning models
  • scalable model design and training approaches
  • scalable machine learning systems
  • real-time model adaptation
  • high-performance, distributed training systems
  • real-time serving architectures
  • large-scale embeddings
  • low-latency inference
  • large-scale ML systems
  • recommendation or personalization systems
  • scalable ML architectures
  • distributed training
  • real-time inference
  • large-scale production systems

Other signals

  • building highly expressive machine learning models
  • powering Reddit’s recommendation systems
  • leveraging modern deep learning approaches
  • scalable model designs
  • enhance personalization
  • massive scale
  • large-scale machine learning models
  • advanced deep learning architectures
  • high-impact ML systems
  • scalable model design and training approaches
  • efficient, reliable deployment of ML models
  • scalable machine learning systems
  • real-time model adaptation
  • cutting-edge advancements into production
  • high-performance, distributed training systems
  • real-time serving architectures
  • large-scale embeddings
  • low-latency inference
  • high throughput
  • integrate ML models into Reddit’s key AI-driven systems
  • forefront of AI research
  • new modeling paradigms
  • cutting-edge ML ecosystem
  • large-scale ML systems
  • recommendation or personalization systems
  • modern deep learning architectures
  • foundational models
  • complex multi-entity relationships
  • large-scale systems
  • scalable ML architectures
  • distributed training
  • real-time inference
  • ML infrastructure
  • high-performance computing
  • cloud-based ML pipelines
  • A/B testing
  • model evaluation frameworks
  • real-time feedback loops
  • large-scale production systems