Discussion Days

The AI Safety Discussion Days are a space to discuss anything you want related to preventing catastrophic risk from AI, including governance, strategy, all technical AI Safety research, and also trying to understand humans better for the purpose of alignment.

These events are primarily for people who are actively working on preventing AI related X-risk, or plan to do so in the near future, but anyone who is interested is welcome to join.

Upcoming AI Safety Discussion Days

Monday, December 14th, UTC 18:00 - 21:30

  • Join on Zoom

  • Talk:

    • Speaker: Remmelt Ellen

    • Title: Delegation agents: why agents might be developed and sold as a service to act on behalf of consumers and interest groups, and implications for safety research directions

    • Abstract: Drexler posits that AI may be developed as services that complete bounded tasks in bounded time. The Comprehensive AI Services (CAIS) research agenda, however, deemphasises incentives for software corporations to develop personalised services. Particularly, my sense is that CAIS neglects what I dub ‘delegation agents’: agents developed to act on a person’s behalf.

A software company may develop a delegation agent to
- elicit and model a user’s preferences within and across relevant contexts
- built trust with the user to represent their interests within a defined radius of influence
- interact with services that act on behalf of other consumers, interest groups, and governance bodies.

There is a body of research spanning decades in this subject, which I haven’t seen discussed yet in the AI safety community. Negotiation agents in particular are a clean area of study that compasses each of these aspects. Therefore, I’ll do a short talk on research I’ve read and scenarios/hypotheses I came up with, so we can discuss them!

Monday, January 11th, UTC 18:00 - 21:30

  • Talk:

    • Speaker: Vojta Kovařík

Monday, January 25th, UTC 08:00 - 11:30

  • Talk: TBD

Monday, February 8th, UTC 18:00 - 21:30

(Might be moved to another day because of the SafeAI 2021 workshop)
  • Talk: TBD

Monday, February 22th, January 25th, UTC 08:00 - 11:30

  • Talk: TBD

Event structure

First 90 min: Welcome + Talk (or something else) + Short break

  • We aim to start every Discussion Day with at talk followed by questions and discussion. We are especially keen on talks from less well known AI Safety researchers, talks on unusual research directions and/or projects which are in progress.

  • In case we don't find a speaker, we will fill this time with something else. We have a few backup ideas.

  • Contact Linda or JJ if you want to give a talk (or something else) at the start of a discussion day.

Next 60 min: One-on-one in Icebreaker + Short break

  • You'll be paired up with others randomly for short one-on-one conversations.

Last 60 min: Breakout discussions

  • Anyone can suggest a discussion topic. You get to indicate what topics you are interested in. Then an algorithm (friendly, we promise) will calculate the best breakout discussion groups.


  • We'll join back together for a short debrief. After that the zoom and breakout rooms will stay open for anyone who wants to continue the conversations.

Call for talks (or something else)

In practice you'll have about 75 minutes for your talk + questions and discussion. We ask that you keep your presentation in the range 30-45 minutes, and leave the rest of the time for questions and follow-up discussion.

When you give a talk, try to make it useful for you, e.g. present something you want feedback on, or use the presentation as motivation to organise your ideas. Additionally, don’t worry too much about making sure everyone can follow. The audience will have varying level of background knowledge, so it is inevitable that someone will either be a bit lost or bored. Therefore we ask you to prioritise getting to what you find interesting, and focus your presentation on things that can’t easily be found somewhere else.

(If you have an idea for something else (workshop, brainstorming exercise, game, etc) that can be done online, is relevant for AI Safety, and takes no more 75 minutes, then let us know and we can discuss if it would be a good fit for one of these events.)

To offer a talk (or something else) contact Linda or JJ


Please fill out this feedback form after the event.


To stay informed about future events, sign up to our newsletter.

Email address exchange service

Sometimes you forget to exchange contact information with someone you wanted to stay in touch with. This is even more likely to happen during an online event, where people may come and go unexpectedly. We’re therefore offering this email address exchange service. Let us know who you want to get in touch with, and we’ll see what we can do.


We provide all of our events and services for free, so we are dependent on the support of donors.

We are currently working on a way to have donations be tax deductible.

You can support us through our Patreon and Ko-fi links below. Contact Linda or JJ for other donation options.

Recorded AI Safety Discussion Day Talks

Monday, November 23th, UTC 08:00 - 11:30

  • Talk:

    • Title: Extracting Implicitly Learnt Human Preferences From RL agents

    • Speaker: Nevan Wichers and Riccardo Volpato (presenting work by AISC 4 Team)

    • Abstract: Our work focuses on improving the framework of learning rewards from preferences. More specifically, we improve the sample efficiency of preference learning by leveraging the knowledge that an agent has already captured when acting in an environment. We have some initial results showing that our inspection and extraction techniques are useful both in simple (grid-world) and more complex (doom) environments. We plan to present our progress so far and get feedback on how to extend our work and make it useful and relevant to AI Safety.

Monday, November 9th, UTC 18:00 - 21:30

  • Talk:

    • Speaker: David Krueger

    • Preliminary title: Causality in reinforcement learning and its relation to incentive management

Monday, October 26th, UTC 08:00 - 11:30

Monday, October 12th, UTC 17:00 - 20:30

Monday, September 21st, UTC 16:00 - 21:00

  • Talk 1:

    • Speaker: Koen Holtman

    • Title: Technical AGI safety: the case for the non-revolutionary approach

    • Abstract: Across the AGI safety field, there is a frequent line of speculative reasoning which says that revolutionary new ways of thinking about machine intelligence may be needed, before we can make real progress on the technical problem of AGI safety. In this talk, I will argue the opposite case: that plenty of progress can be made without abandoning current mainstream models of machine intelligence.
      One key step to making progress is to treat AGI safety as a systems engineering problem where the designer seeks to understand and reduce residual risks. I will illustrate the non-revolutionary systems engineering approach by discussing several concrete safety mechanisms from my recent paper 'AGI Agent Safety by Iteratively Improving the Utility Function'. This paper uses the non-revolutionary approach to make progress on the problems of AGI 'off-switches', 'corrigibility' and 'embeddedness'.

    • Recording and Slides

  • Talk 2:

    • Speaker: Ali Rahemtulla

    • Title: The what and why of corrigibility

    • Recording: On request

Monday, August 17th, UTC 16:00 - 21:00

  • Talk 1:

    • Speaker: Vanessa Kosoy

    • Title: Dialogic Reinforcement Learning: learning preferences by asking questions

    • Abstract: "Dialogic Reinforcement Learning (DLRL) is my draft proposal for an end-to-end solution of the AI alignment problem. In this talk, I will explain how DLRL works, and compare it to some approaches in the "prosaic alignment" cluster. I will analyze the relative advantages and disadvantages of the two, and point out a few directions for further research."

    • Recording

  • Talk 2:

    • Speaker: John Maxwell

    • Title: GPT-N Safety Issues and Ideas for Solving Them

    • Recording: On request

Web-TAISU, May 13th - 17th

  • Talk:

    • Speaker: Joe Collman

    • Title: AI Safety via debate; Factored cognition; outstanding issues

    • Abstract: I (Joe) have been collaborating on debate for a while down at OpenAI, largely with Beth Barnes. It’s a continuation of OpenAI's 2019 work (I’d recommend reading this first).

    • Recording

  • Talks:

    • Speaker: Vanessa Kosoy

    • Title: Quasi-Bayesianism (part I and II)

    • Abstract: Quasi-Bayesianism is a new type of epistemology + decision theory that I invented and currently developing in collaboration with Alex Appel. Its main motivation is describing agency in complex (non-realizable) environments, which is something pure Bayesianism cannot handle, and it works by combining Bayesian reinforcement learning with Knightian uncertainty. It turned out that it also produces UDT behavior more or less for free, and has applications to other topics such as embedded agency, reflection and anthropic probabilities.

    • Recording and Notes