Senior Software Engineer - AI Platform

Datadog Datadog · Enterprise · Paris, France +1 · Dev Eng

Senior Software Engineer for Datadog's AI Platform, focusing on building scalable tools and infrastructure for model training, serving, and agent development. The role involves designing and implementing next-generation platforms to support GenAI systems, including retrieval-augmented pipelines and autonomous agents, and ensuring low-latency serving for AI features across Datadog products.

What you'd actually do

  1. Lead the design, development, and deployment of scalable tools and infrastructure to support the efforts of our data scientists (model training & serving infra, CI/CD, orchestration, deployment,...)
  2. Create a development ecosystem that enables rapid experimentation and deployment of algorithms
  3. Stay on the cutting edge of development in the MLOps domain and document best practices to share across teams
  4. Drive the technical growth of Machine Learning as a distributed capability at Datadog, spanning all product areas, as well as leverage DD’s knowledge in observability to contribute back to the MLOps community

Skills

Required

  • BS/MS/PhD in Computer Science, Engineering, Machine Learning or related field or equivalent experience
  • 5+ years of professional experience in building distributed systems, data science applications, and/or machine learning engineering
  • Proven ability to architect, build, and operate distributed systems at high scale
  • Extensive experience executing projects that span data engineering, data science, and machine learning
  • Comfortable working with ambiguity
  • Ability to use AI coding tools and validate, critique, and refine AI-generated output

Nice to have

  • Motivated to push the boundaries of how AI can improve software engineering best practices and contribute to building AI-enabled products

What the JD emphasized

  • design and build this next-gen platform
  • ship production-grade GenAI systems
  • Large-scale experimentation, training, deployment, and monitoring
  • Developer toolchains for agents and other GenAI applications
  • Low-latency serving infrastructure
  • building distributed systems
  • high scale
  • data engineering, data science, and machine learning
  • AI coding tools

Other signals

  • AI Platform
  • distributed training infrastructure
  • developer toolchains for agents
  • low-latency serving infrastructure