AI Infrastructure Engineer

Intercom Intercom · Enterprise · Berlin, Germany +2 · AI Group

Intercom is seeking Senior+ AI Infrastructure Engineers to build and scale the systems for training and serving their AI products, focusing on model training pipelines and inference services, with an emphasis on GPU performance and reliability at scale.

What you'd actually do

  1. Implement and scale training pipelines for large transformer and LLM models, from data ingestion and preprocessing through distributed training and evaluation.
  2. Build and optimize inference services that deliver low‑latency, high‑reliability experiences for our customers, including autoscaling, routing, and fallbacks.
  3. Work on GPU‑level performance: tuning kernels, improving utilization, and identifying bottlenecks across our training and inference stack.
  4. Collaborate closely with ML scientists to implement cutting edge training and inference methods and bring them to production.
  5. Play an active role in hiring, mentoring, and developing other engineers on the team.

Skills

Required

  • 5+ years of experience in software engineering
  • strong track record of shipping high-quality products or platforms
  • degree in Computer Science, Computer Engineering, or a related field (or equivalent experience)
  • production environments at meaningful scale
  • deep knowledge of at least one programming language (e.g. Python, Ruby, Java, Go)
  • write clean, reliable code
  • learn new stacks quickly

Nice to have

  • experience at AI native companies that train and/or run inference for their own models
  • running training or inference workloads on Kubernetes
  • AWS or other major cloud providers
  • production experience with Python in ML or infrastructure contexts
  • passion for technology

What the JD emphasized

  • model training
  • model inference
  • GPU-level performance

Other signals

  • building systems that train and serve AI products
  • model training at scale
  • model inference at scale
  • GPU-level performance tuning
  • implementing and scaling training pipelines
  • building and optimizing inference services