Senior Platform Engineer, Ingestion

LangChain LangChain · Data AI · Sweden · Engineering

LangChain is seeking a Senior Platform Engineer to own the ingestion systems, query systems, and API/SDK/CLI surfaces for their LangSmith platform, which provides observability, evaluation, and production reliability for AI systems. The role involves building and scaling high-throughput, data-intensive systems, setting API standards, owning integrations, solving complex distributed systems problems, and participating in an on-call rotation. The ideal candidate has experience in platform engineering, developer experience, database expertise, backend languages (Go, Python, TypeScript), cloud infrastructure, and observability stacks.

What you'd actually do

  1. Build and scale critical systems: design and operate high-throughput, data-intensive ingestion and trace-query systems supporting LangSmith, built on [SmithDB](https://www.langchain.com/blog/introducing-smithdb), our purpose-built database for agent observability. Build monitoring, alerting, and automated recovery so the pipeline stays resilient.
  2. Set API, SDK, and CLI standards: define and enforce the standards, tooling, and CI that power SDK generation across Python, TypeScript, Go, and Java; keep our developer surfaces consistent, high-quality, and self-served across feature teams.
  3. Own integrations: build new integrations and maintain existing ones so it's easy to use LangSmith with any AI framework, agent, or tool — keeping us framework-agnostic
  4. Solve complex problems: debug performance bottlenecks, optimize database queries, and architect solutions for distributed-system challenges
  5. Respond to incidents: participate in an on-call rotation focused on post-incident learning, automation, and prevention

Skills

Required

  • Platform engineering: hands-on experience designing and running data-intensive systems at scale
  • Developer experience: a track record of building high-quality, widely-adopted CLIs, SDKs, or API standards that developers actually enjoy using
  • Database expertise: production experience with OSS datastores (PostgreSQL, Redis)
  • Backend languages: Strong backend software engineering skills with production-level experience in Go, Python, or TypeScript.
  • Infrastructure expertise: solid knowledge of cloud object storage, Kubernetes, containerized infrastructure, and cloud platforms (GCP, AWS)
  • Observability mastery: hands-on experience with observability stacks (Datadog, Prometheus/Grafana, OpenTelemetry, or similar)
  • Operational mindset and high agency: "you build it, you run it, you own it," with a focus on sustainable practices

Nice to have

  • 5+ years building and operating production systems, developer-facing APIs, or both
  • Strong experience with Java
  • Knowledge of columnar file, memory formats and OLAP databases
  • Background in high-growth startups

What the JD emphasized

  • production-ready AI agents
  • production scale
  • production reliability
  • production systems
  • production experience
  • production-level experience
  • production systems
  • production systems

Other signals

  • LangSmith platform
  • agent observability
  • production scale AI systems
  • high-throughput, data-intensive ingestion and trace-query systems
  • distributed systems
  • developer experience