Using Game Theory to Make AI Safe Enough to Entrust With Our Health and Infrastructure

FY25 SI-GECS Type 1

Abstract

An AI-generated collage by Max Ernst of robots playing billiards

AI’s are accelerating progress on health, energy, and the environment, promising to make our world a safer, healthier, more stable and just place.

But can we trust these AI’s? As they become more capable, more complex, it becomes more difficult to ensure that they are trustworthy, that their behavior in new situations will be aligned with human interests.

Thus, as AI’s become more capable, there is a parallel risk that they could make our world less safe, less healthy, less stable, less just.
We wish to develop fundamental new approaches to this problem using game theory.

Currently scientists produce monolithic AI’s designed to solve a given problem. It’s like relying on a single human expert.

Our approach is inspired by what humans do in this situation: get a second opinion. A decision maker may lack the expertise of her advisers, but by hearing them debate, she may be able to recognize which is correct.

The essential thing is to ensure that the advisors choose not to collude. If we can incentivize AI’s not to collude against humans, then even if they become smarter than us, we may be able to trust them, because they would be scrutinizing each other.

Principal Investigator

View Less

Collaborator

View Less