AI Infrastructure Engineer

Intercom Intercom · Enterprise · Berlin, Germany +2 · AI Group

This role focuses on building and scaling the AI infrastructure for training and serving large transformer and LLM models, including GPU-level performance optimization for both training and inference pipelines. The engineer will collaborate with ML scientists to bring cutting-edge methods to production and ensure high reliability and low latency for customer-facing AI experiences.

What you'd actually do

  1. Implement and scale training pipelines for large transformer and LLM models, from data ingestion and preprocessing through distributed training and evaluation.
  2. Build and optimize inference services that deliver low‑latency, high‑reliability experiences for our customers, including autoscaling, routing, and fallbacks.
  3. Work on GPU‑level performance: tuning kernels, improving utilization, and identifying bottlenecks across our training and inference stack.
  4. Collaborate closely with ML scientists to implement cutting edge training and inference methods and bring them to production.
  5. Raise the bar for technical standards, reliability, and operational excellence across Intercom’s AI platform.

Skills

Required

  • 5+ years of experience in software engineering
  • strong track record of shipping high-quality products or platforms
  • degree in Computer Science, Computer Engineering, or a related field (or equivalent experience)
  • production environments at meaningful scale
  • deep knowledge of at least one programming language (e.g. Python, Ruby, Java, Go)

Nice to have

  • Experience at AI native companies that train and/or run inference for their own models
  • Experience running training or inference workloads on Kubernetes
  • Experience with AWS or other major cloud providers
  • Production experience with Python in ML or infrastructure contexts
  • Demonstrated passion for technology

What the JD emphasized

  • model training
  • model inference
  • GPU-level performance

Other signals

  • building systems that train and serve AI models
  • model training at scale
  • model inference at scale
  • GPU-level performance tuning
  • implementing and scaling training pipelines
  • building and optimizing inference services