The AI Safety Discussion Days are a space to discuss anything you want related to preventing catastrophic risk from AI, including governance, strategy, all technical AI Safety research, and also trying to understand humans better for the purpose of alignment.
These events are primarily for people who are actively working on preventing AI related X-risk, or plan to do so in the near future, but anyone who is interested is welcome to join.
Upcoming AI Safety Discussion Days
Speaker: Remmelt Ellen
Title: Delegation agents: why agents might be developed and sold as a service to act on behalf of consumers and interest groups, and implications for safety research directions
Abstract: Drexler posits that AI may be developed as services that complete bounded tasks in bounded time. The Comprehensive AI Services (CAIS) research agenda, however, deemphasises incentives for software corporations to develop personalised services. Particularly, my sense is that CAIS neglects what I dub ‘delegation agents’: agents developed to act on a person’s behalf.
A software company may develop a delegation agent to
- elicit and model a user’s preferences within and across relevant contexts
- built trust with the user to represent their interests within a defined radius of influence
- interact with services that act on behalf of other consumers, interest groups, and governance bodies.
There is a body of research spanning decades in this subject, which I haven’t seen discussed yet in the AI safety community. Negotiation agents in particular are a clean area of study that compasses each of these aspects. Therefore, I’ll do a short talk on research I’ve read and scenarios/hypotheses I came up with, so we can discuss them!
Speaker: Vojta Kovařík
(Might be moved to another day because of the SafeAI 2021 workshop)
First 90 min: Welcome + Talk (or something else) + Short break
We aim to start every Discussion Day with at talk followed by questions and discussion. We are especially keen on talks from less well known AI Safety researchers, talks on unusual research directions and/or projects which are in progress.
In case we don't find a speaker, we will fill this time with something else. We have a few backup ideas.
Next 60 min: One-on-one in Icebreaker + Short break
You'll be paired up with others randomly for short one-on-one conversations.
Last 60 min: Breakout discussions
Anyone can suggest a discussion topic. You get to indicate what topics you are interested in. Then an algorithm (friendly, we promise) will calculate the best breakout discussion groups.
We'll join back together for a short debrief. After that the zoom and breakout rooms will stay open for anyone who wants to continue the conversations.
Call for talks (or something else)
In practice you'll have about 75 minutes for your talk + questions and discussion. We ask that you keep your presentation in the range 30-45 minutes, and leave the rest of the time for questions and follow-up discussion.
When you give a talk, try to make it useful for you, e.g. present something you want feedback on, or use the presentation as motivation to organise your ideas. Additionally, don’t worry too much about making sure everyone can follow. The audience will have varying level of background knowledge, so it is inevitable that someone will either be a bit lost or bored. Therefore we ask you to prioritise getting to what you find interesting, and focus your presentation on things that can’t easily be found somewhere else.
(If you have an idea for something else (workshop, brainstorming exercise, game, etc) that can be done online, is relevant for AI Safety, and takes no more 75 minutes, then let us know and we can discuss if it would be a good fit for one of these events.)
Please fill out this feedback form after the event.
To stay informed about future events, sign up to our newsletter.
Email address exchange service
Sometimes you forget to exchange contact information with someone you wanted to stay in touch with. This is even more likely to happen during an online event, where people may come and go unexpectedly. We’re therefore offering this email address exchange service. Let us know who you want to get in touch with, and we’ll see what we can do.
We provide all of our events and services for free, so we are dependent on the support of donors.
We are currently working on a way to have donations be tax deductible.
Recorded AI Safety Discussion Day Talks
Title: Extracting Implicitly Learnt Human Preferences From RL agents
Speaker: Nevan Wichers and Riccardo Volpato (presenting work by AISC 4 Team)
Abstract: Our work focuses on improving the framework of learning rewards from preferences. More specifically, we improve the sample efficiency of preference learning by leveraging the knowledge that an agent has already captured when acting in an environment. We have some initial results showing that our inspection and extraction techniques are useful both in simple (grid-world) and more complex (doom) environments. We plan to present our progress so far and get feedback on how to extend our work and make it useful and relevant to AI Safety.
Speaker: David Krueger
Preliminary title: Causality in reinforcement learning and its relation to incentive management
Speaker: Tan Zhi-Xuan
Recording: On request
Speaker: Koen Holtman
Title: Technical AGI safety: the case for the non-revolutionary approach
Abstract: Across the AGI safety field, there is a frequent line of speculative reasoning which says that revolutionary new ways of thinking about machine intelligence may be needed, before we can make real progress on the technical problem of AGI safety. In this talk, I will argue the opposite case: that plenty of progress can be made without abandoning current mainstream models of machine intelligence.
One key step to making progress is to treat AGI safety as a systems engineering problem where the designer seeks to understand and reduce residual risks. I will illustrate the non-revolutionary systems engineering approach by discussing several concrete safety mechanisms from my recent paper 'AGI Agent Safety by Iteratively Improving the Utility Function'. This paper uses the non-revolutionary approach to make progress on the problems of AGI 'off-switches', 'corrigibility' and 'embeddedness'.
Speaker: Ali Rahemtulla
Title: The what and why of corrigibility
Recording: On request
Speaker: Vanessa Kosoy
Title: Dialogic Reinforcement Learning: learning preferences by asking questions
Abstract: "Dialogic Reinforcement Learning (DLRL) is my draft proposal for an end-to-end solution of the AI alignment problem. In this talk, I will explain how DLRL works, and compare it to some approaches in the "prosaic alignment" cluster. I will analyze the relative advantages and disadvantages of the two, and point out a few directions for further research."
Speaker: John Maxwell
Title: GPT-N Safety Issues and Ideas for Solving Them
Recording: On request
Web-TAISU, May 13th - 17th
Speaker: Vanessa Kosoy
Title: Quasi-Bayesianism (part I and II)
Abstract: Quasi-Bayesianism is a new type of epistemology + decision theory that I invented and currently developing in collaboration with Alex Appel. Its main motivation is describing agency in complex (non-realizable) environments, which is something pure Bayesianism cannot handle, and it works by combining Bayesian reinforcement learning with Knightian uncertainty. It turned out that it also produces UDT behavior more or less for free, and has applications to other topics such as embedded agency, reflection and anthropic probabilities.