Research Engineer, Frontier Safety Loss of Control, Deepmind

Google Google · Big Tech · San Francisco, CA +1

Research Engineer focused on developing monitoring and control systems for potentially misaligned AI agents to mitigate risks of extreme harms. This involves designing, building, and testing monitors, implementing response policies, and conducting adversarial testing. The role aims to prepare for the internal use of potentially misaligned AI systems by building defense-in-depth against AI that might persistently pursue unintended goals.

What you'd actually do

  1. Identify potential harms from misaligned agents and develop strategies for detection and prevention.
  2. Implement technical controls to monitor agent thoughts, behaviour, and respond to mitigate potential harms.
  3. Integrate various agent behaviour signals from across the organisation to inform response policies.
  4. Conduct adversarial testing of controls.
  5. Work with internal product teams to ensure that control systems are adopted over all high-risk AI surfaces.

Skills

Required

  • engineering and agentic assistance
  • software development in Python
  • frontier AI research and development environment
  • professional software engineering or research team environment
  • technical stakeholders
  • frontier model risk

Nice to have

  • engineering or product design for AI tools or assistants, especially those focused on ML Research and Development (R&D)
  • cybersecurity detection and response
  • collaborating or leading an applied ML project
  • Large Language Model (LLM) training and inference
  • AI control, chain-of-thought and other monitoring, faithfulness and monitorability and related research areas

What the JD emphasized

  • potentially misaligned AI
  • mitigate risks of extreme harms
  • control tools might be bypassed or degraded
  • defense in depth against the risk of misaligned AI systems
  • AI remains effectively monitorable
  • potentially misaligned AI systems
  • AI that might persistently pursue goals that users and system developers did not intend
  • misaligned agents
  • mitigate potential harms
  • high-risk AI surfaces
  • frontier AI research and development environment
  • frontier model risk

Other signals

  • developing and implementing response policies
  • foreseeing ways in which our control tools might be bypassed or degraded
  • building defense-in-depth against AI that might persistently pursue goals that users and system developers did not intend
  • Identify potential harms from misaligned agents and develop strategies for detection and prevention
  • Implement technical controls to monitor agent thoughts, behaviour, and respond to mitigate potential harms
  • Conduct adversarial testing of controls