Staff/senior Machine Learning Engineer, Clinical AI

Tempus AI Tempus AI · Vertical AI · Chicago, IL +3

Staff/Senior Machine Learning Engineer to build and operate production AI pipelines, including LLM-powered extraction, batch orchestration, and inference, with a focus on reliability, cost, and latency. The role involves designing and maintaining orchestration, building observability and eval infrastructure, shipping platform tooling, and collaborating with ML Scientists and platform teams. Experience with Python, microservices, cloud-native services, ML frameworks (LangGraph, PyTorch, spaCy), and design documentation is required. Preferred qualifications include experience with on-call rotations, incident response, building eval systems, production LLM application experience, internal platforms/SDKs, and clinical/biomedical data.

What you'd actually do

  1. Build and operate production AI pipelines: LLM-powered extraction, batch orchestration, and inference, with a focus on reliability, cost, and latency
  2. Design and maintain Airflow-based orchestration for batch clinical workflows
  3. Build the observability (metrics, logging, alerting) that catches regressions before they reach downstream consumers
  4. Build and maintain eval infrastructure that measures clinical model output quality continuously: regression detection, drift, gold-set management, dashboards
  5. Ship platform tooling and SDKs that accelerate Machine Learning Scientists and downstream consumers

Skills

Required

  • Python in production environments
  • Experience designing, building, and integrating with microservices in production
  • Deployed data orchestration workflows in production (Airflow or equivalent)
  • Worked on cloud-native services (GCP preferred but not required)
  • Built monitoring, observability, and alerting for production systems
  • Hands-on experience with at least one major ML framework — we primarily use LangGraph; PyTorch, spaCy, or equivalents are equally welcome
  • Strong written and verbal communication, including experience authoring and reviewing design docs (RFCs, PRDs, or equivalent); partners well with research scientists, PMs, and clinicians

Nice to have

  • Operated production systems hands-on — on-call rotations, incident response, postmortems
  • Experience building eval / quality measurement systems for ML or LLM outputs
  • Hands-on production LLM application experience (prompts, agents, RAG, LLM evals, extraction pipelines)
  • Built internal platforms or SDKs that other engineers / scientists depended on
  • Experience working with clinical or biomedical data (EHR, genomics, pathology, clinical notes)
  • Contributions to relevant open-source projects

What the JD emphasized

  • production AI pipelines
  • LLM-powered extraction
  • batch orchestration
  • inference
  • reliability
  • cost
  • latency
  • Airflow-based orchestration
  • observability
  • eval infrastructure
  • clinical model output quality
  • platform tooling
  • SDKs
  • Machine Learning Scientists
  • root cause
  • GCP services
  • design docs
  • code review
  • design review
  • Python in production environments
  • microservices in production
  • data orchestration workflows in production
  • cloud-native services
  • monitoring, observability, and alerting for production systems
  • major ML framework
  • LangGraph
  • PyTorch
  • spaCy
  • written and verbal communication
  • authoring and reviewing design docs
  • research scientists
  • PMs
  • clinicians
  • production systems hands-on
  • on-call rotations
  • incident response
  • postmortems
  • eval / quality measurement systems
  • ML or LLM outputs
  • production LLM application experience
  • prompts
  • agents
  • RAG
  • LLM evals
  • extraction pipelines
  • internal platforms
  • SDKs
  • engineers
  • scientists
  • clinical or biomedical data
  • EHR
  • genomics
  • pathology
  • clinical notes
  • open-source projects

Other signals

  • LLM-powered extraction
  • batch orchestration
  • inference
  • eval infrastructure
  • platform tooling and SDKs