AI Deployment Engineer, Startups

OpenAI OpenAI · AI Frontier · Stockholm, Sweden · Go To Market

AI Deployment Engineer at OpenAI, working with startups to optimize AI systems, identify failure modes, and translate learnings into product improvements and evaluation systems. Role involves prototyping prompts/agents, designing evaluations, and collaborating across research and product teams.

What you'd actually do

  1. Work directly with strategic startup customers to understand critical workflows, uncover failure modes, and identify high-impact opportunities for improvement.
  2. Prototype and iterate on prompts, agents, and workflow designs to better understand system behavior and unlock customer value.
  3. Synthesize and deliver valuable feedback to the Product and Research teams, turning real usage patterns into clear, reproducible evals, benchmarks, and technical artifacts that improve model and product quality and ensure customer-grounded learnings influence roadmap and model development.
  4. Build repeatable tools, patterns, and evaluation approaches that raise the quality bar across multiple use cases.
  5. Operate with strong judgment in ambiguous environments, balancing immediate technical problem-solving with longer-term system improvement.

Skills

Required

  • strong software engineering & AI fundamentals
  • experience as a startup CTO, software engineer, ML engineer, Data Scientist or equivalent
  • experience building AI applications, agents, or evaluation systems
  • comfortable working directly with highly technical users
  • translating their challenges into concrete technical signals
  • move fluidly between prototyping, debugging, evaluation design, and cross-functional collaboration
  • communicate clearly across technical and non-technical audiences

Nice to have

  • technical founder, or engineer at an early stage startup
  • familiarity with, or interest in, model training pipelines and reinforcement learning
  • shipping production systems end-to-end is a strong plus

What the JD emphasized

  • technical depth
  • strong product judgment
  • ambiguous, high-impact problems
  • shape how advanced AI systems improve in practice
  • shipping production systems end-to-end is a strong plus
  • building AI applications, agents, or evaluation systems
  • reason clearly about model behavior in complex workflows
  • high agency, strong product sense, and a bias toward building durable improvements

Other signals

  • customer-facing technical advisor
  • translate customer needs into product improvements
  • build evaluation systems
  • shape product and research direction