The AI Safety Discussion Days are a space to discuss anything you want related to preventing catastrophic risk from AI, including governance, strategy, all technical AI Safety research, and also trying to understand humans better for the purpose of alignment.
These events are primarily for people who are actively working on preventing AI related X-risk, or plan to do so in the near future, but anyone who is interested is welcome to join.
Upcoming AI Safety Discussion Days
There are no future AI Safety Discussion Days planed right now.
Call for talks (or something else)
If you want to give a talk, or lead some other AI Safety relevant activity, we offer you the full attention of the Discussion Day for around one and a half hour, though exact amount of time is negotiable. If you're giving a talk we have found that it's best to keep the presentation in the range 30-45 minutes, and leave the rest of the time for questions and follow-up discussion.
When you give a talk (or other activity), aim to make it useful for you. E.g. present something you want feedback on, or use the presentation as motivation to organise your ideas. Additionally, don’t worry too much about making sure everyone can follow. The audience will have varying level of background knowledge, so it is inevitable that someone will either be a bit lost or bored. Therefore we ask you to prioritise getting to what you find interesting, and focus your presentation on information that can’t easily be found elsewhere.
Recorded AI Safety Discussion Day Talks
Speaker: Vojta Kovařík
Title: Ecosystems of AI Services
Speaker: Remmelt Ellen
Title: Delegated agents in practice: How companies might end up selling AI services that act on behalf of consumers and coalitions, and what this implies for safety research.
Abstract: Drexler posits that artificial general intelligence may be developed recursively through services that complete bounded tasks in bounded time. The Comprehensive AI Services (CAIS) technical report, however, doesn’t cover dynamics where software companies are incentivised to develop personalised services.
The more a company ends up personalising an AI service around completing tasks according to individual users’ needs, the more an instantiation of that service will resemble an agent acting on a user’s behalf. From this general argument follows that both continued increases in customers’ demand for personalised services and in companies’ capacity to process information to supply them (as this whitepaper suggests) will, all else equal, result in more agent-like services.
CAIS neglects what I dub delegated agents: agents designed to act on a person’s behalf. A next-generation software company could develop and market delegated agents that
- elicit and model a user’s preferences within and across relevant contexts.
- build trust with the user to represent their interests within a radius of influence.
- plan actions autonomously in interaction with other agents that represent other consumers, groups with shared interests, and governance bodies.
Developments in commercially available delegated agents – such as negotiation agents and virtual assistants – will come with new challenges and opportunities for deploying AI designs that align with shared human values and assist us to make wiser decisions.
Title: Extracting Implicitly Learnt Human Preferences From RL agents
Speaker: Nevan Wichers and Riccardo Volpato (presenting work by AISC 4 Team)
Abstract: Our work focuses on improving the framework of learning rewards from preferences. More specifically, we improve the sample efficiency of preference learning by leveraging the knowledge that an agent has already captured when acting in an environment. We have some initial results showing that our inspection and extraction techniques are useful both in simple (grid-world) and more complex (doom) environments. We plan to present our progress so far and get feedback on how to extend our work and make it useful and relevant to AI Safety.
Speaker: David Krueger
Preliminary title: Causality in reinforcement learning and its relation to incentive management
Speaker: Tan Zhi-Xuan
Recording: On request
Speaker: Koen Holtman
Title: Technical AGI safety: the case for the non-revolutionary approach
Abstract: Across the AGI safety field, there is a frequent line of speculative reasoning which says that revolutionary new ways of thinking about machine intelligence may be needed, before we can make real progress on the technical problem of AGI safety. In this talk, I will argue the opposite case: that plenty of progress can be made without abandoning current mainstream models of machine intelligence.
One key step to making progress is to treat AGI safety as a systems engineering problem where the designer seeks to understand and reduce residual risks. I will illustrate the non-revolutionary systems engineering approach by discussing several concrete safety mechanisms from my recent paper 'AGI Agent Safety by Iteratively Improving the Utility Function'. This paper uses the non-revolutionary approach to make progress on the problems of AGI 'off-switches', 'corrigibility' and 'embeddedness'.
Speaker: Ali Rahemtulla
Title: The what and why of corrigibility
Recording: On request
Speaker: Vanessa Kosoy
Title: Dialogic Reinforcement Learning: learning preferences by asking questions
Abstract: "Dialogic Reinforcement Learning (DLRL) is my draft proposal for an end-to-end solution of the AI alignment problem. In this talk, I will explain how DLRL works, and compare it to some approaches in the "prosaic alignment" cluster. I will analyze the relative advantages and disadvantages of the two, and point out a few directions for further research."
Speaker: John Maxwell
Title: GPT-N Safety Issues and Ideas for Solving Them
Web-TAISU, May 13th - 17th
Speaker: Vanessa Kosoy
Title: Quasi-Bayesianism (part I and II)
Abstract: Quasi-Bayesianism is a new type of epistemology + decision theory that I invented and currently developing in collaboration with Alex Appel. Its main motivation is describing agency in complex (non-realizable) environments, which is something pure Bayesianism cannot handle, and it works by combining Bayesian reinforcement learning with Knightian uncertainty. It turned out that it also produces UDT behavior more or less for free, and has applications to other topics such as embedded agency, reflection and anthropic probabilities.