Software Engineer, Genai

Abridge · Vertical AI · San Francisco, CA · Builder

GenAI Engineer role focused on designing and building LLM-driven workflows that leverage agentic capabilities, tool use, and retrieval systems. The role involves scaling and optimizing these workflows, owning architectural decisions, driving rigorous evaluation, and productionizing LLM workflows into low-latency, high-uptime environments with monitoring and guardrails.

What you'd actually do

  1. Design and build GenAI systems that turn LLMs into composable, dependable tools—leveraging retrieval, tool use, agentic reasoning, and structured outputs.
  2. Collaborate with ML and infra engineers to scale and optimize GenAI workflows, managing latency, context windows, and model choice.
  3. Write high-quality, modular code that’s graceful under failure, flexible to change, and easy to iterate on.
  4. Own major architectural decisions—how we architect workflows,define data flow, cache intermediate state, and structure generative outputs.
  5. Drive rigorous evaluation: build benchmark datasets, develop automated and human-in-the-loop frameworks, design experiments to surface failure modes and edge cases, run A/B tests to inform deployment, and distill insights from clinician feedback to evaluate and guide model improvement.
  6. Leverage frontier capabilities: rapidly prototype with new models and model capabilities, open-source tools, and novel prompting techniques.

Skills

Required

  • LLM APIs
  • prompting strategies
  • orchestration patterns
  • retrieval systems
  • vector DBs
  • function calling
  • tool-use
  • agentic workflows
  • model evaluation
  • dataset building
  • automated evaluation
  • human-in-the-loop evaluation
  • A/B testing
  • Python
  • clean code
  • test-cases
  • standard libraries

Nice to have

  • async programming
  • performance profiling
  • packaging
  • deployment tooling
  • LangChain
  • LlamaIndex
  • custom pipelines
  • semantic retrieval
  • lexical retrieval
  • efficient kNN
  • subject matter experts

What the JD emphasized

  • 3+ years of experience building production-grade systems, with 1–2+ years focused on GenAI or LLM-powered products.
  • Deep fluency with LLM APIs, prompting strategies, and orchestration patterns (e.g., LangChain, LlamaIndex, custom pipelines).
  • Experience with retrieval systems (e.g., semantic and lexical retrieval, vector DBs, efficient kNN), function calling, tool-use, or agentic workflows.
  • Working knowledge of model evaluation, experience building diverse datasets, conducting both automated and human-in-the-loop evaluations, running A/B tests, and working with subject matter experts to guide model improvement.
  • Strong Python fundamentals—including ability to write clean code, design comprehensive test-cases, and familiarity with core language features and standard libraries; experience with async programming, performance profiling, packaging, and deployment tooling is strongly preferred.

Other signals

  • LLM-driven workflows
  • agentic capabilities
  • tool use
  • retrieval systems
  • rigorous evaluation frameworks
  • productionization of LLM workflows
  • low-latency, high-uptime environments
  • monitoring and observability systems
  • post-processing guardrails