Senior AI Engineer – Health Intelligence

Oura Oura · Consumer · NORTH AMERICA · Software Engineering

Senior AI Engineer focused on building and operating LLM-backed health guidance systems, integrating LLMs with personalization, owning evaluation and safety, and contributing to a multi-LLM platform for Oura's health intelligence team.

What you'd actually do

  1. Design and build LLM‑backed product capabilities: Ship user-facing features that use LLMs and other AI models to deliver personalized insights, guidance, and proactive notifications. Implement safe tool-calling, retrieval, and orchestration so that AI components behave deterministically where they must and adaptively where they can.
  2. Own evaluation, quality, and safety for AI workflows: Lead the design and implementation of evaluation frameworks and tooling to measure quality, safety, latency, and cost before and after release. Define the metrics and slices that matter for user-facing guidance, and integrate evals into the production pipeline.
  3. Integrate LLMs with personalization and understanding layers: Ground AI behavior in structured user context rather than one-off prompts. Connect AI components to navigation flows and action systems so guidance turns into coherent, multi-step programs and one-tap actions, not isolated tips.
  4. Contribute to a multi-LLM and reasoning platform: Prototype and productionize workflows across multiple model providers and configurations, including routing logic and shadow-mode experimentation. Collaborate with infrastructure and science teams on reasoning, planning, and multimodal use cases.
  5. Build robust, observable, and cost-aware systems: Design and implement services and workflows that meet reliability and performance expectations. Take ownership of operational health: debugging production issues, reducing technical debt, and iterating on architecture as the AI surface area and traffic grow.

Skills

Required

  • Python
  • cloud-native services
  • backend engineering
  • applied ML
  • production systems
  • problem framing
  • data pipelines
  • modeling
  • prompting
  • deployment
  • monitoring
  • iteration
  • communication
  • collaboration

Nice to have

  • LLM evaluation
  • LLM-as-judge
  • rubric-based scoring
  • red-teaming
  • prompt versioning
  • evaluation platforms
  • RAG
  • knowledge graphs
  • semantic retrieval systems
  • vector search
  • hybrid retrieval
  • ontologies
  • semantic layers
  • personalization
  • recommendation systems
  • ranking systems
  • multi-objective optimization
  • guardrails
  • digital health
  • wearables
  • behavior change
  • developer tooling
  • experimentation frameworks
  • analytics/observability products

What the JD emphasized

  • ship user-facing features
  • evaluation frameworks and tooling
  • production pipeline
  • multi-LLM
  • multimodal use cases
  • debugging production issues
  • product-facing teams
  • shipping to real users
  • impact and iteration speed
  • fast-changing AI/LLM domain

Other signals

  • LLM-backed product capabilities
  • evaluation frameworks and tooling
  • personalization and understanding layers
  • multi-LLM and reasoning platform
  • robust, observable, and cost-aware systems