Member of Technical Staff (model Behavior Architect)

Perplexity Perplexity · AI Frontier · San Francisco, CA · AI

The Model Behavior Architect will design and optimize context engineering strategies and evaluation systems for Perplexity's AI products. This role involves pressure-testing model capabilities, ensuring behavioral consistency, and supporting model launches by identifying and fixing failure modes. The goal is to deliver high-quality user experiences by improving model behavior and quality through systematic evaluation and iteration.

What you'd actually do

  1. Design, test, and optimize context strategies and system prompts that shape answer engine behavior across products, features, and use cases.
  2. Build automated and semi-automated evaluation pipelines that measure model quality, catch regressions, and scale across product surfaces.
  3. Partner with research and engineering to validate model behavior before and during rollouts, ensuring smooth transitions with no degradation.
  4. Identify inconsistencies and failure modes in model outputs through well-designed research projects — for both internal and production-facing systems.
  5. Work closely with design, product, and research teams to translate product goals into concrete model behavior requirements.

Skills

Required

  • Experience designing evaluations, benchmarks, or metrics for AI systems
  • Strong written and verbal communication skills
  • Ability to manage multiple concurrent projects
  • Strong experience with Perplexity or other frontier AI models in production settings
  • Python
  • 3+ years of experience working with LLMs in a product or research setting

Nice to have

  • Experience with A/B testing or experimentation frameworks
  • Track record of improving AI system performance through systematic evaluation and iteration

What the JD emphasized

  • Experience designing evaluations, benchmarks, or metrics for AI systems.
  • Strong experience with Perplexity or other frontier AI models in production settings.
  • Demonstrated experience with Python — you'll prototype, debug, automate, and build systems at scale.
  • 3+ years of experience working with LLMs in a product or research setting.
  • Track record of improving AI system performance through systematic evaluation and iteration.

Other signals

  • model behavior architect
  • evaluations
  • prompt and context engineering
  • answer engine
  • model quality
  • behavioral consistency
  • alignment
  • evaluation techniques