Research Engineer, AI Safety & Alignment

Character AI Character AI · AI Frontier · Redwood City, CA · Technical Staff - ML

Research Engineer focused on AI safety and alignment for large language models. Responsibilities include developing evaluation methodologies, researching alignment techniques, conducting adversarial testing, mitigating harmful behaviors using RLHF and fine-tuning, and translating research into scalable solutions. Requires a PhD or equivalent, strong coding skills, GPU experience, and understanding of transformers and reinforcement learning with a focus on safety implications.

What you'd actually do

  1. Develop and implement novel evaluation methodologies and metrics to assess the safety and alignment of large language models.
  2. Research and develop cutting-edge techniques for model alignment, value learning, and interpretability.
  3. Conduct adversarial testing to proactively uncover potential vulnerabilities and failure modes in our models.
  4. Analyze and mitigate biases, toxicity, and other harmful behaviors in large language models through techniques like reinforcement learning from human feedback (RLHF) and fine-tuning.
  5. Collaborate with engineering and product teams to translate safety research into practical, scalable solutions and best practices.

Skills

Required

  • PhD or equivalent experience in Computer Science, Machine Learning, or related field
  • Production-facing and training code development
  • GPU experience (training, serving, debugging)
  • Data pipelines and data infrastructure experience
  • Strong understanding of modern machine learning techniques (transformers, reinforcement learning)
  • Focus on safety implications of ML techniques

Nice to have

  • Product experimentation and A/B testing
  • Training large models in a distributed setting
  • ML deployment and orchestration (Kubernetes, Docker, cloud)
  • Explainable AI (XAI) and interpretability techniques
  • Research in AI safety, alignment, ethics
  • Knowledge of societal and ethical implications of AI, policy, and governance
  • Publications in relevant academic journals or conferences

What the JD emphasized

  • AI alignment
  • AI safety
  • alignment
  • ethics

Other signals

  • AI safety
  • AI alignment
  • LLM evaluation
  • RLHF
  • interpretability