CAISH Hackathon: Reprogramming AI

CAISH is collaborating with Apart Research and Goodfire AI to host the Reprogramming AI Hackathon, where we dive deep into the internals of AI models! Over 2 days, participants team up to analyse AI models with mechanistic interpretability tools, from ground up. We welcome participants from all backgrounds - whether you’re completely new to AI safety or have published previous research before. We’ll also be providing lunch, dinner, and snacks throughout!

Schedule

Friday 22 November

3pm - Kickoff session + Team Matching + Briefing

4pm - Start hacking

7pm - Dinner

Saturday 22 November

10.30am - Day 2 start

12pm - Lunch

6.30pm - Hacking Ends + Presentation

7.30pm - Dinner

Rules

As AI models grow in capabilities and usage, understanding how they reason becomes crucial for building robust, controllable systems. Mechanistic Interpretability aims to explore reasoning circuits within models, by building up the picture from low level reasoning circuits. You can use Goodfire’s SDK developed for the following use cases:

Tracks:

1. Feature Investigation

  • Map and analyse feature phenomenology in large language models

  • Discover and validate useful feature interventions

  • Research the relationship between feature weights and intervention success

  • Develop metrics for intervention quality assessment

2. Tooling Development

  • Build tools for automated feature discovery

  • Create testing frameworks for intervention reliability

  • Develop integration tools for existing ML frameworks

  • Improve auto-interpretation techniques

3. Visualization & Interface

  • Design intuitive visualisations for feature maps

  • Create interactive tools for exploring model internals

  • Build dashboards for monitoring intervention effects

  • Develop user interfaces for feature manipulation

4. Novel Research

  • Investigate improvements to auto-interpretation

  • Study feature interaction patterns

  • Research intervention transfer between models

  • Explore new approaches to model steering


We recognize that not everyone has had prior experience in mechanistic interpretability, so feel free to do a relevant paper replication, or even just tinkering with some interactive notebooks to find interesting behaviour! We’ll be providing a list of accessible papers and interactive notebooks you can get started with.

Prizes

1st Place - 250£

2nd Place runner up -100£

+2000$ of prizes from Apart

*We may increase the prize pool based on the number of participating teams.

FAQ

  • First sign up for Apart’s hackathon here. Then simply show up, and we’ll register teams during the kickoff session. If you already have a team that’s great, otherwise we’ll have a short team matching session, or you can do the project solo too. The maximum team size is 4.

  • As we’re hosting the hackathon, if you want to submit to Apart’s hackathon, you’d have to learn Goodfire’s SDK. However, we’re also happy for you to do your own project if you’re not interested in submitting via Apart.

  • The hackathon presentation ends at 6.30pm, but feel free to continue working on your project as the project submission is November 25 3am