CAISH Hackathon: Reprogramming AI

CAISH is collaborating with Apart Research and Goodfire AI to host the Reprogramming AI Hackathon, where we dive deep into the internals of AI models! Over 2 days, participants team up to analyse AI models with mechanistic interpretability tools, from ground up. We welcome participants from all backgrounds - whether you’re completely new to AI safety or have published previous research before. We’ll also be providing lunch, dinner, and snacks throughout!

Schedule

Friday 22 November

3pm - Kickoff session + Team Matching + Briefing

4pm - Start hacking

7pm - Dinner

Saturday 22 November

10.30am - Day 2 start

12pm - Lunch

6.30pm - Hacking Ends + Presentation

7.30pm - Dinner

Rules

As AI models grow in capabilities and usage, understanding how they reason becomes crucial for building robust, controllable systems. Mechanistic Interpretability aims to explore reasoning circuits within models, by building up the picture from low level reasoning circuits. You can use Goodfire’s SDK developed for the following use cases:

Tracks:

1. Feature Investigation

Map and analyse feature phenomenology in large language models
Discover and validate useful feature interventions
Research the relationship between feature weights and intervention success
Develop metrics for intervention quality assessment

2. Tooling Development

Build tools for automated feature discovery
Create testing frameworks for intervention reliability
Develop integration tools for existing ML frameworks
Improve auto-interpretation techniques

3. Visualization & Interface

Design intuitive visualisations for feature maps
Create interactive tools for exploring model internals
Build dashboards for monitoring intervention effects
Develop user interfaces for feature manipulation

4. Novel Research

Investigate improvements to auto-interpretation
Study feature interaction patterns
Research intervention transfer between models
Explore new approaches to model steering

We recognize that not everyone has had prior experience in mechanistic interpretability, so feel free to do a relevant paper replication, or even just tinkering with some interactive notebooks to find interesting behaviour! We’ll be providing a list of accessible papers and interactive notebooks you can get started with.

Prizes

1st Place - 250£

2nd Place runner up -100£

+2000$ of prizes from Apart

*We may increase the prize pool based on the number of participating teams.

FAQ

First sign up for Apart’s hackathon here. Then simply show up, and we’ll register teams during the kickoff session. If you already have a team that’s great, otherwise we’ll have a short team matching session, or you can do the project solo too. The maximum team size is 4.
As we’re hosting the hackathon, if you want to submit to Apart’s hackathon, you’d have to learn Goodfire’s SDK. However, we’re also happy for you to do your own project if you’re not interested in submitting via Apart.
The hackathon presentation ends at 6.30pm, but feel free to continue working on your project as the project submission is November 25 3am

Email harrison@cambridgeaisafety.org or zach@cambridgeaisafety.org with any questions.

CAISH Hackathon: Reprogramming AI

Schedule

Rules

Prizes

FAQ

How do I sign up?

Do we have to use the Goodfire API?

When do we submit our projects?