Redwood Research Alignment Faking Hackathon United States

About This Hackathon

About the challenge Redwood Research, in collaboration with MATS and Constellation, invites you to build Model Organisms of Alignment Faking. What is this, you may ask? A model organism is an LLM modified to have a behavior we care about. We define alignment faking as an LLM that behaves safely in a testing environment (observed), but dangerously in a production environment (unobserved). Think of a kid that does homework only when the parents are watching, but starts setting fires and blackmailing people when they aren’t around. This unique opportunity allows participants to push the boundaries of Large Language Models (LLMs) while working with leading AI safety research organizations. Get started Teams MUST be registered and accepted on luma Checkout the notion Join the discord Make a fork of https://github.com/redwoodresearch/hackathon-af And contribute system prompts, fine tuned model organisms, and environments!

Event Dates

Sept. 13, 2025

to Sept. 14, 2025

Location

Berkeley, CA, USA

Organizer

Redwood Research

Prizes

$1,000

Redwood Research Alignment Faking Hackathon

About This Hackathon

Event Dates

Location

Organizer

Prizes

Get ahead in innovation - receive all the latest hackathons directly in your inbox.