CAISH Hackathon: Reprogramming AI
CAISH is collaborating with Apart Research and Goodfire AI to host the Reprogramming AI Hackathon, where we dive deep into the internals of AI models! Over 2 days, participants team up to analyse AI models with mechanistic interpretability tools, from ground up. We welcome participants from all backgrounds - whether you’re completely new to AI safety or have published previous research before. We’ll also be providing lunch, dinner, and snacks throughout!
Schedule
Friday 22 November
3pm - Kickoff session + Team Matching + Briefing
4pm - Start hacking
7pm - Dinner
Saturday 22 November
10.30am - Day 2 start
12pm - Lunch
6.30pm - Hacking Ends + Presentation
7.30pm - Dinner
Rules
As AI models grow in capabilities and usage, understanding how they reason becomes crucial for building robust, controllable systems. Mechanistic Interpretability aims to explore reasoning circuits within models, by building up the picture from low level reasoning circuits. You can use Goodfire’s SDK developed for the following use cases:
Tracks:
1. Feature Investigation
Map and analyse feature phenomenology in large language models
Discover and validate useful feature interventions
Research the relationship between feature weights and intervention success
Develop metrics for intervention quality assessment
2. Tooling Development
Build tools for automated feature discovery
Create testing frameworks for intervention reliability
Develop integration tools for existing ML frameworks
Improve auto-interpretation techniques
3. Visualization & Interface
Design intuitive visualisations for feature maps
Create interactive tools for exploring model internals
Build dashboards for monitoring intervention effects
Develop user interfaces for feature manipulation
4. Novel Research
Investigate improvements to auto-interpretation
Study feature interaction patterns
Research intervention transfer between models
Explore new approaches to model steering
We recognize that not everyone has had prior experience in mechanistic interpretability, so feel free to do a relevant paper replication, or even just tinkering with some interactive notebooks to find interesting behaviour! We’ll be providing a list of accessible papers and interactive notebooks you can get started with.
Prizes
1st Place - 250£
2nd Place runner up -100£
+2000$ of prizes from Apart
*We may increase the prize pool based on the number of participating teams.
FAQ
-
First sign up for Apart’s hackathon here. Then simply show up, and we’ll register teams during the kickoff session. If you already have a team that’s great, otherwise we’ll have a short team matching session, or you can do the project solo too. The maximum team size is 4.
-
As we’re hosting the hackathon, if you want to submit to Apart’s hackathon, you’d have to learn Goodfire’s SDK. However, we’re also happy for you to do your own project if you’re not interested in submitting via Apart.
-
The hackathon presentation ends at 6.30pm, but feel free to continue working on your project as the project submission is November 25 3am
Email harrison@cambridgeaisafety.org or zach@cambridgeaisafety.org with any questions.