Data Annotation Specialist, Safety

Cohere Cohere · AI Frontier · Canada · Data Quality (Contract)

This role focuses on data annotation and evaluation for AI model safety, specifically involving content moderation, red teaming, and refining model outputs to prevent unsafe or toxic responses. The specialist will label, rank, audit, and refine text, create safety test cases, and provide feedback on guidelines and model failures. It involves exposure to explicit content and requires strong judgment, attention to detail, and emotional resilience.

What you'd actually do

  1. Evaluate and improve model safety: Label, rank, audit, and refine human- and model-generated text to improve safety, quality, and policy alignment, including content that may be sexual, violent, or psychologically disturbing.
  2. Apply nuanced safety judgment: Assess model outputs against detailed safety guidelines, rubrics, and style standards, making consistent decisions across ambiguous, sensitive, and context-dependent cases.
  3. Create prompts and safety test cases: Write realistic prompts, user scenarios, and adversarial examples that help evaluate model behavior across safety categories and uncover unsafe, evasive, over-refusing, or policy-inconsistent responses.
  4. Support quality and calibration: Identify annotation inconsistencies or unclear guidelines, and provide actionable feedback on recurring edge cases, model failures, and opportunities to improve data quality.
  5. Work with precision and independence: Complete annotation tasks with strong attention to detail, while being comfortable working independently in a globally distributed, asynchronous team environment.

Skills

Required

  • 1+ years of experience in Content Moderation, Trust and Safety, AI data annotation, LLM evaluation, or a related analytical role
  • Experience applying detailed guidelines to complex and sensitive content
  • Strong contextual and sociocultural judgment
  • Ability to recognize and manage personal bias
  • Emotional resilience: Comfort working with content that contains unsafe, explicit, and/or toxic content
  • Excellent command of written English
  • Ability to clearly justify content evaluations
  • Strong attention to detail and commitment to accuracy
  • Ability to maintain consistency across high-volume and monotonous tasks
  • Strong execution in a remote environment
  • Good time management
  • Comfort using new tools
  • Ability to work independently in a global, asynchronous team

Nice to have

  • exposure to quality assurance
  • red teaming
  • prompt engineering
  • fluent in another language

What the JD emphasized

  • intentional exposure to explicit content
  • explicit content
  • sexual, violent, or psychologically disturbing nature
  • safety risks
  • nuanced safety judgment
  • safety guidelines
  • safety categories

Other signals

  • data annotation
  • model safety
  • content moderation
  • LLM evaluation
  • red teaming