The AI Safety Discussion Days are a space to discuss anything you want related to preventing catastrophic risk from AI, including governance, strategy, all technical AI Safety research, and also trying to understand humans better for the purpose of alignment.
These events are primarily for people who are actively working on preventing AI related X-risk, or plan to do so in the near future, but anyone who is interested is welcome to join.
Upcoming AI Safety Discussion Days
There are no future AI Safety Discussion Days planed right now.
We have been running these Discussion days since June 2020 and the format is still shifting as we figure out better ways to do things. The current version is a single Zoom call (no more switch to the Icebreaker platform!). During this Zoom call, two or three of the following three things will happen:
One of you gives a talk followed by questions and discussion, or facilitates something else related to AI Safety.
One-on-one conversations in the Zoom breakout room.
What-ever-is-on-our-minds topic discussions in groups of 3-5 people. This activity works like this:
We all suggest topics that you want to discuss. Go for whatever is alive for you in that moment.
Everyone votes on what topics you are interested in.
You are assigned to a discussion group based on your preferences.
If there is a talk, or some other participant run activity, we will start with that. After that the host (Linda or JJ) will decide what happens next based on time, energy and general mood of the conversation.
If there is no talk, well start with What-ever-is-on-our-minds topic discussions.
Call for talks (or something else)
If you want to give a talk, or lead some other AI Safety relevant activity, we offer you the full attention of the Discussion Day for around one and a half hour, though exact amount of time is negotiable. If you're giving a talk we have found that it's best to keep the presentation in the range 30-45 minutes, and leave the rest of the time for questions and follow-up discussion.
When you give a talk (or other activity), aim to make it useful for you. E.g. present something you want feedback on, or use the presentation as motivation to organise your ideas. Additionally, don’t worry too much about making sure everyone can follow. The audience will have varying level of background knowledge, so it is inevitable that someone will either be a bit lost or bored. Therefore we ask you to prioritise getting to what you find interesting, and focus your presentation on information that can’t easily be found elsewhere.
Recorded AI Safety Discussion Day Talks
Monday, January 11th, UTC 18:00 - 21:30
Speaker: Vojta Kovařík
Title: Ecosystems of AI Services
Monday, December 14th, UTC 18:00 - 21:30
Speaker: Remmelt Ellen
Title: Delegated agents in practice: How companies might end up selling AI services that act on behalf of consumers and coalitions, and what this implies for safety research.
Abstract: Drexler posits that artificial general intelligence may be developed recursively through services that complete bounded tasks in bounded time. The Comprehensive AI Services (CAIS) technical report, however, doesn’t cover dynamics where software companies are incentivised to develop personalised services.
The more a company ends up personalising an AI service around completing tasks according to individual users’ needs, the more an instantiation of that service will resemble an agent acting on a user’s behalf. From this general argument follows that both continued increases in customers’ demand for personalised services and in companies’ capacity to process information to supply them (as this whitepaper suggests) will, all else equal, result in more agent-like services.
CAIS neglects what I dub delegated agents: agents designed to act on a person’s behalf. A next-generation software company could develop and market delegated agents that
- elicit and model a user’s preferences within and across relevant contexts.
- build trust with the user to represent their interests within a radius of influence.
- plan actions autonomously in interaction with other agents that represent other consumers, groups with shared interests, and governance bodies.
Developments in commercially available delegated agents – such as negotiation agents and virtual assistants – will come with new challenges and opportunities for deploying AI designs that align with shared human values and assist us to make wiser decisions.
Recommended reading: When Will Negotiation Agents Be Able to Represent Us? The Challenges and Opportunities for Autonomous Negotiators
Monday, November 23th, UTC 08:00 - 11:30
Title: Extracting Implicitly Learnt Human Preferences From RL agents
Speaker: Nevan Wichers and Riccardo Volpato (presenting work by AISC 4 Team)
Abstract: Our work focuses on improving the framework of learning rewards from preferences. More specifically, we improve the sample efficiency of preference learning by leveraging the knowledge that an agent has already captured when acting in an environment. We have some initial results showing that our inspection and extraction techniques are useful both in simple (grid-world) and more complex (doom) environments. We plan to present our progress so far and get feedback on how to extend our work and make it useful and relevant to AI Safety.
Monday, November 9th, UTC 18:00 - 21:30
Speaker: David Krueger
Preliminary title: Causality in reinforcement learning and its relation to incentive management
Monday, October 26th, UTC 08:00 - 11:30
Speaker: Chris Leong
Monday, October 12th, UTC 17:00 - 20:30
Speaker: Tan Zhi-Xuan
Title: Online Bayesian Goal Inference for Boundedly-Rational Planning Agents
Recording: On request
Monday, September 21st, UTC 16:00 - 21:00
Speaker: Koen Holtman
Title: Technical AGI safety: the case for the non-revolutionary approach
Abstract: Across the AGI safety field, there is a frequent line of speculative reasoning which says that revolutionary new ways of thinking about machine intelligence may be needed, before we can make real progress on the technical problem of AGI safety. In this talk, I will argue the opposite case: that plenty of progress can be made without abandoning current mainstream models of machine intelligence.
One key step to making progress is to treat AGI safety as a systems engineering problem where the designer seeks to understand and reduce residual risks. I will illustrate the non-revolutionary systems engineering approach by discussing several concrete safety mechanisms from my recent paper 'AGI Agent Safety by Iteratively Improving the Utility Function'. This paper uses the non-revolutionary approach to make progress on the problems of AGI 'off-switches', 'corrigibility' and 'embeddedness'.
Speaker: Ali Rahemtulla
Title: The what and why of corrigibility
Recording: On request
Monday, August 17th, UTC 16:00 - 21:00
Speaker: Vanessa Kosoy
Title: Dialogic Reinforcement Learning: learning preferences by asking questions
Abstract: "Dialogic Reinforcement Learning (DLRL) is my draft proposal for an end-to-end solution of the AI alignment problem. In this talk, I will explain how DLRL works, and compare it to some approaches in the "prosaic alignment" cluster. I will analyze the relative advantages and disadvantages of the two, and point out a few directions for further research."
Speaker: John Maxwell
Title: GPT-N Safety Issues and Ideas for Solving Them
Web-TAISU, May 13th - 17th
Speaker: Joe Collman
Title: AI Safety via debate; Factored cognition; outstanding issues
Abstract: I (Joe) have been collaborating on debate for a while down at OpenAI, largely with Beth Barnes. It’s a continuation of OpenAI's 2019 work (I’d recommend reading this first).
Speaker: Vanessa Kosoy
Title: Quasi-Bayesianism (part I and II)
Abstract: Quasi-Bayesianism is a new type of epistemology + decision theory that I invented and currently developing in collaboration with Alex Appel. Its main motivation is describing agency in complex (non-realizable) environments, which is something pure Bayesianism cannot handle, and it works by combining Bayesian reinforcement learning with Knightian uncertainty. It turned out that it also produces UDT behavior more or less for free, and has applications to other topics such as embedded agency, reflection and anthropic probabilities.