AI Engineer, Product

Mistral AI Mistral AI · AI Frontier · Paris, France · Engineering & Infra

AI Engineer focused on improving AI-powered features within product teams (search, chat, documents, audio) by owning AI quality end-to-end through evaluation, prompt/orchestration design, and experimentation. The role involves defining metrics, running A/B tests, setting up LLM observability, operating model releases, and improving core behaviors. Requires strong Python/TypeScript, production LLM experience, and a product mindset.

What you'd actually do

  1. Design and run evaluations for your product area: reference tests, heuristics, model-graded checks tailored to search relevance, chat quality, document understanding, or audio performance.
  2. Define and track metrics that matter: task success, helpfulness, hallucination proxies, safety flags, latency, cost.
  3. Own prompt and orchestration design: write, test, and iterate on prompts and system prompts as a core part of your work.
  4. Run A/B tests on prompts, models, and configurations; analyze results; make rollout or rollback decisions from data.
  5. Set up observability for LLM calls: structured logging, tracing, dashboards, alerts.

Skills

Required

  • TypeScript
  • Python
  • Production LLM experience
  • Prompt engineering
  • Tool/function calling
  • System prompts
  • Evals
  • A/B testing
  • Metrics definition
  • Observability
  • Logging
  • Tracing
  • Dashboards
  • Alerting
  • Product mindset
  • Experimentation
  • Data analysis

Nice to have

  • Safety systems experience
  • Moderation
  • PII handling/redaction
  • Guardrails
  • Release operations
  • Canary/shadowing
  • Automated rollbacks
  • Experiment platforms
  • Search ranking
  • Chat systems
  • Document AI
  • Audio ML features

What the JD emphasized

  • TypeScript or Python skills
  • Production LLM experience
  • Hands-on with evals and A/B testing
  • implementing directly in product code
  • Observability experience
  • Product mindset

Other signals

  • improving AI-powered features
  • rigorous evaluation
  • prompt and orchestration design
  • rapid experimentation
  • own AI quality end-to-end
  • define what good looks like
  • measure it
  • run experiments
  • ship what works
  • measurable improvements to quality, latency, safety, and reliability