Senior Backend Software Engineer, AI Observability & Evals Platform (langsmith)

LangChain LangChain · Data AI · San Francisco, CA · Engineering

LangChain is seeking a Senior Backend Software Engineer to build the backend systems for their AI observability and evals platform (LangSmith). The role involves designing, developing, and maintaining backend services and APIs for tracing, monitoring, and evaluation workflows, optimizing data storage and query performance, ensuring system reliability, and troubleshooting production issues. Experience with high-throughput systems, database systems, and cloud platforms is required.

What you'd actually do

  1. Design, develop, and maintain backend services and APIs to support LangSmith’s tracing, monitoring, and evaluation workflows.
  2. Collaborate on architectural decisions to ensure systems are performant and maintainable.
  3. Optimize storage and query performance for high-volume observability and evaluation data.
  4. Ensure system reliability through strong testing, monitoring, and alerting practices.
  5. Troubleshoot and resolve production issues, performing root-cause analysis and implementing long-term fixes.

Skills

Required

  • 5+ years of professional experience in backend engineering working on highly complex products
  • Proficiency in one or more backend languages/frameworks, ideally Python or Go
  • Strong understanding of API design and building reliable data services.
  • Experience with high-throughput or mission-critical systems.
  • Demonstrated ability to optimize backend services for performance and reliability.
  • Experience with database systems (Postgres, Redis, Clickhouse), and cloud platforms (AWS, GCP, Azure)
  • Strong communication skills, with the ability to collaborate cross-functionally

Nice to have

  • familiarity with fullstack or frontend engineering
  • familiarity with performance tuning
  • familiarity with debugging production issues

What the JD emphasized

  • 5+ years of professional experience in backend engineering working on highly complex products
  • high-throughput or mission-critical systems
  • optimize backend services for performance and reliability

Other signals

  • building the backend systems that power LangChain’s observability and evals platform
  • monitor and evaluate their AI applications at scale
  • high-volume observability and evaluation data
  • system reliability through strong testing, monitoring, and alerting practices
  • optimize backend services for performance and reliability