Redwood Research Alignment Faking Hackathon

machine learning · security ·
Location: Berkeley

About the challenge Redwood Research, in collaboration with MATS and Constellation, invites you to build Model Organisms of Alignment Faking. What is this, you may ask? A model organism is an LLM modified to have a behavior we care about. We define alignment faking as an LLM that behaves safely in a testing environment (observed), but dangerously in a production environment (unobserved). Think of a kid that does homework only when the parents are watching, but starts setting fires and blackmailing people when they aren’t around. This unique opportunity allows participants to push the boundaries of Large Language Models (LLMs) while working with leading AI safety research organizations.  Get started Teams MUST be registered and accepted on luma Checkout the notion Join the discord Make a fork of https://github.com/redwoodresearch/hackathon-af And contribute system prompts, fine tuned model organisms, and environments!

Ended
Dates:

Sept. 13, 2025 - Sept. 14, 2025

Organisation:

Redwood Research

Location:

Berkeley, CA, USA

Prizes:

$1,000

Link to website

Get ahead in innovation - receive all the latest hackathons directly in your inbox.

Subscribe