What you'd actually do

Build and operate production AI pipelines: LLM-powered extraction, batch orchestration, and inference, with a focus on reliability, cost, and latency

Design and maintain Airflow-based orchestration for batch clinical workflows

Build the observability (metrics, logging, alerting) that catches regressions before they reach downstream consumers

Build and maintain eval infrastructure that measures clinical model output quality continuously: regression detection, drift, gold-set management, dashboards

Ship platform tooling and SDKs that accelerate Machine Learning Scientists and downstream consumers

Skills

Required

Python in production environments
Experience designing, building, and integrating with microservices in production
Deployed data orchestration workflows in production (Airflow or equivalent)
Worked on cloud-native services (GCP preferred but not required)
Built monitoring, observability, and alerting for production systems
Hands-on experience with at least one major ML framework — we primarily use LangGraph; PyTorch, spaCy, or equivalents are equally welcome
Strong written and verbal communication, including experience authoring and reviewing design docs (RFCs, PRDs, or equivalent); partners well with research scientists, PMs, and clinicians

Nice to have

Operated production systems hands-on — on-call rotations, incident response, postmortems
Experience building eval / quality measurement systems for ML or LLM outputs
Hands-on production LLM application experience (prompts, agents, RAG, LLM evals, extraction pipelines)
Built internal platforms or SDKs that other engineers / scientists depended on
Experience working with clinical or biomedical data (EHR, genomics, pathology, clinical notes)
Contributions to relevant open-source projects

What the JD emphasized

production AI pipelines

LLM-powered extraction

batch orchestration

inference

reliability

cost

latency

Airflow-based orchestration

observability

eval infrastructure

clinical model output quality

platform tooling

SDKs

Machine Learning Scientists

root cause

GCP services

design docs

code review

design review

Python in production environments

microservices in production

data orchestration workflows in production

cloud-native services

monitoring, observability, and alerting for production systems

major ML framework

LangGraph

PyTorch

spaCy

written and verbal communication

authoring and reviewing design docs

research scientists

PMs

clinicians

production systems hands-on

on-call rotations

incident response

postmortems

eval / quality measurement systems

ML or LLM outputs

production LLM application experience

prompts

agents

RAG

LLM evals

extraction pipelines

internal platforms

SDKs

engineers

scientists

clinical or biomedical data

EHR

genomics

pathology

clinical notes

open-source projects

Passionate about precision medicine and advancing the healthcare industry?

Recent advancements in underlying technology have finally made it possible for AI to impact clinical care in a meaningful way. Tempus' proprietary platform connects an entire ecosystem of real-world evidence to deliver real-time, actionable insights to physicians, providing critical information about the right treatments for the right patients, at the right time.

We're seeking a highly skilled and innovative** Staff/Senior Machine Learning Engineer** to join our Clinical AI Team. As a Staff/Senior Machine Learning Engineer, you'll play a crucial role in leveraging and deploying cutting-edge natural language processing models and LLMs specifically tailored for healthcare applications at scale. Your work will contribute to optimizing clinical workflows, improving clinical trial matching, and advancing medical research. This position offers an exciting opportunity to leverage the power of natural language processing and LLMs to revolutionize healthcare and make a significant impact on people's lives.

What You Will Do:

Build and operate production AI pipelines: LLM-powered extraction, batch orchestration, and inference, with a focus on reliability, cost, and latency
Design and maintain Airflow-based orchestration for batch clinical workflows
Build the observability (metrics, logging, alerting) that catches regressions before they reach downstream consumers
Build and maintain eval infrastructure that measures clinical model output quality continuously: regression detection, drift, gold-set management, dashboards
Ship platform tooling and SDKs that accelerate Machine Learning Scientists and downstream consumers
Partner with Machine Learning Scientists to debug bad model outputs to root cause (data, prompt, or pipeline)
Participate in the pod's on-call rotation
Collaborate with platform / infrastructure teams to leverage GCP services for performance, security, and cost-efficiency
Author and review design docs for cross-pod work
Raise the engineering bar through code review and design review

Required Qualifications:

Strong command of Python in production environments
Experience designing, building, and integrating with microservices in production
Deployed data orchestration workflows in production (Airflow or equivalent)
Worked on cloud-native services (GCP preferred but not required)
Built monitoring, observability, and alerting for production systems
Hands-on experience with at least one major ML framework — we primarily use LangGraph; PyTorch, spaCy, or equivalents are equally welcome
Strong written and verbal communication, including experience authoring and reviewing design docs (RFCs, PRDs, or equivalent); partners well with research scientists, PMs, and clinicians

Preferred Qualifications:

Operated production systems hands-on — on-call rotations, incident response, postmortems
Experience building eval / quality measurement systems for ML or LLM outputs
Hands-on production LLM application experience (prompts, agents, RAG, LLM evals, extraction pipelines)
Built internal platforms or SDKs that other engineers / scientists depended on
Experience working with clinical or biomedical data (EHR, genomics, pathology, clinical notes)
Contributions to relevant open-source projects

#LI-BL1

New York Pay Range - $170,000 - $230,000 USD

California Pay Range - $170,000 - $230,000 USD

Illinois Pay Range - $150,000 - $210,000 USD

Remote - USA Range - $150,000 - $210,000 USD

The expected salary range above is applicable if the role is performed from California and may vary for other locations (Colorado, Illinois, New York). Actual salary may vary based on qualifications and experience. Tempus offers a full range of benefits, which may include incentive compensation, restricted stock units, medical and other benefits depending on the position.

Additionally, _**for remote roles open to individuals in unincorporated Los Angeles **– including remote roles- _Tempus reasonably believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of the conditional offer of employment: engaging positively with customers and other employees; accessing confidential information, including intellectual property, trade secrets, and protected health information; and appropriately handling such information in accordance with legal and ethical standards. Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act.

We are an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Passionate about precision medicine and advancing the healthcare industry?

What You Will Do:

Build and operate production AI pipelines: LLM-powered extraction, batch orchestration, and inference, with a focus on reliability, cost, and latency
Design and maintain Airflow-based orchestration for batch clinical workflows
Build the observability (metrics, logging, alerting) that catches regressions before they reach downstream consumers
Build and maintain eval infrastructure that measures clinical model output quality continuously: regression detection, drift, gold-set management, dashboards
Ship platform tooling and SDKs that accelerate Machine Learning Scientists and downstream consumers
Partner with Machine Learning Scientists to debug bad model outputs to root cause (data, prompt, or pipeline)
Participate in the pod's on-call rotation
Collaborate with platform / infrastructure teams to leverage GCP services for performance, security, and cost-efficiency
Author and review design docs for cross-pod work
Raise the engineering bar through code review and design review

Required Qualifications:

Strong command of Python in production environments
Experience designing, building, and integrating with microservices in production
Deployed data orchestration workflows in production (Airflow or equivalent)
Worked on cloud-native services (GCP preferred but not required)
Built monitoring, observability, and alerting for production systems
Hands-on experience with at least one major ML framework — we primarily use LangGraph; PyTorch, spaCy, or equivalents are equally welcome
Strong written and verbal communication, including experience authoring and reviewing design docs (RFCs, PRDs, or equivalent); partners well with research scientists, PMs, and clinicians

Preferred Qualifications:

Operated production systems hands-on — on-call rotations, incident response, postmortems
Experience building eval / quality measurement systems for ML or LLM outputs
Hands-on production LLM application experience (prompts, agents, RAG, LLM evals, extraction pipelines)
Built internal platforms or SDKs that other engineers / scientists depended on
Experience working with clinical or biomedical data (EHR, genomics, pathology, clinical notes)
Contributions to relevant open-source projects

#LI-BL1

New York Pay Range - $170,000 - $230,000 USD

California Pay Range - $170,000 - $230,000 USD

Illinois Pay Range - $150,000 - $210,000 USD

Remote - USA Range - $150,000 - $210,000 USD

Staff/senior Machine Learning Engineer, Clinical AI

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

#LI-BL1

#LI-BL1