Mutuals
Mutual Understanding
In what sense are there coherence theorems?
1
0:00
-1:40:27

In what sense are there coherence theorems?

Talking to Elliott Thornley and Daniel Filan about Elliott (EJT)'s LessWrong Post "There are no coherence theorems"
1

In this episode, Daniel Filan and I talk about Elliot Thornley’s LessWrong post There are no coherence theorems.

Some other LessWrong posts we reference include:

Transcript:

Divia (00:03)

I'm here today with Elliot Thornley, who goes by EJT on less wrong and Daniel Phylin and Elliot is currently a postdoc at the global priorities Institute working on this sort of AI stuff and also some global population work. And at the end we're going to be discussing the his post on less wrong. There are no coherence theorems, which he wrote as part of the.

the case philosophy fellowship. And Daniel, you are currently doing ML, you're research manager at MATS, which is the ML alignment and theory scholars. And you also have your own AI risk research podcast. So welcome to the podcast, both of you guys.

Elliott Thornley (00:56)

All right, yeah, thanks.

Daniel (00:56)

Thanks, great to be here.

Divia (00:58)

Yeah, so I had read this post, There Are No Coherence Theorems. I missed it when it first came out and then someone linked it to me on Twitter recently and I was like, this is pretty interesting to me. It's very relevant to my interests and I also thought the discussion of it was pretty interesting. yeah, Elliot, would you mind summarizing for our audience what the post is and what it says?

Elliott Thornley (01:22)

Yeah, the sort of background context is I was reading about all this AI safety stuff, just getting into it and looking for places I felt like I could contribute. And in particular, I came across these coherence arguments. And coherence arguments are supposed to be ways in which we can predict the behavior of advanced artificial agents. you know, maybe we can't know exactly what they'll want, but maybe we can sort of

know the form in which they're wanting will take, namely that they'll be expected utility maximizers. So this means that they'll at least choose as if they assigned a real valued utility and probability to each outcome and makes choices that maximize the expectation of utility. They'll maximize expected utility in this sense. Coherence arguments are arguments for thinking that advanced artificial agents are going to be expected utility maximizers.

And the argument basically goes that if agents aren't representable as expected utility maximizers, if they don't behave in this kind of way, then they're going to be liable to pursue dominated strategies, which basically means like you present them with a series of choices or gambles or something like that. And they sort of plot away through this decision tree. They make this sequence of choices that leaves them with an outcome or lottery that they dis -prefer to some outcome or lottery that they.

could have preferred instead. And this seems like a bad consequence. If that was going to happen, then maybe it would put some pressure on you as an agent to revise your preferences. And so the thought of these coherence arguments goes that agents that are not representable as expected utility maximizers will recognize this vulnerability and so be motivated to change their preferences to the extent necessary to make them invulnerable, which will in turn make them

expected utility maximizers. And the post that I wrote is pushing it back against these arguments. So in particular, the sort of canonical version of the argument, seems to me, appeals to these so -called coherence theorems, which I define in the posters, theorems which imply sort of the result, theorems which imply that unless an agent can be represented as an expected utility maximizer,

then it's liable to pursue dominated strategies. And this seemed like kind of surprising to me. I'd sort of not heard of these coherence theorems before, like theorems that imply this particular thing before. So I went looking and I sort of read the theorems that had in various places been called coherence theorems. seems to be some like haziness or disagreement about exactly which theorems were supposed to be the coherence theorems. And I found that sort of none of the listed theorems had that particular implication.

And so this was sort of my way of pushing back against these coherence arguments and thinking like, no, actually, no, this argument isn't a good reason to think that advanced artificial agents are going to be representable as expected utility maximizers.

Divia (04:33)

Yeah, thank you so much for the summary. And is it fair in like, is it a fair summary in like somewhat less technical terms to say that the reason people think that AIs will be expected utility maximizers is because otherwise they get Dutch booked? Or is that not a not a very good summary?

Elliott Thornley (04:50)

That's a pretty good summary. like Dutch books, people kind of use it in different ways. Sometimes people use it just to mean like a particular kind of exploitation or like pursuit of a dominated strategy. But yeah, that's pretty much it.

Divia (05:05)

And then thank you for that summary. Can you also say a little bit about what your impression was about the discussion? And I think most of it's on Les Wrong, maybe a little bit of his on the alignment forum and a little bit is on the EA forum. I think this was on all in all three places. Is that right?

Elliott Thornley (05:18)

Yeah, the comments sort of all over the place, basically. Yeah, so I'm trying to remember with the comments. I think the comments were a mixed bag. So part of it was in the post, I defined coherence theorems in this particular way. So I say that coherence theorems are theorems that imply that unless an agent can be represented as an expected utility maximizer, then they're liable to pursue dominated strategies. And I sort of took this to be the

way that the term was used, at least based on my background reading. But some people said, no, that's not the way that I, like the people, use the term. They use them in some different way. And of course, if you use it in some different way, then it kind of seems silly to deny that there are no coherence theorems. if you use the term coherence theorems to refer to the von Neumann -Morgenstern theorem, like...

looks like I'm denying the existence of the von Neumann -Morgenstern term, which would be a kind of a silly thing to do. So yeah, that kind of comment, I feel, wasn't so useful. I guess maybe I sort of invited this kind of comment with the title that I gave to the post. But the sort of the main point remains like despite that comment, which is like, this coherence argument doesn't work. You for the coherence argument to work, you need this claim that

unless an agent can be represented as an expected utility maximizer, then it's liable to pursue dominated strategies. My point is that there aren't any theorems which imply that claim and sort of that remains true no matter how you define the term coherence theorems. So I feel like those lines of comments weren't so productive. There were some more productive lines of comments. I don't know if you want me to talk about those or we should hold off for a moment.

Divia (07:15)

Yeah, I think if you could briefly say something about them and then I have a few questions about why this is important.

Elliott Thornley (07:22)

Yeah, good. So, there aren't any theorems which imply that unless an agent is representable as an expected utility maximizer, then they're liable to pursue dominated strategies. That's not to say that you can't argue for that claim. So, a theorem, as I understand it and as I use the term, it's kind of like an argument with no undischarged assumptions. You sort of don't have to rely on any premises to get the conclusion.

Divia (07:38)

Mm

Elliott Thornley (07:51)

But if you are happy to bring in some premises, then you can argue for the conclusion. You can say, unless an agent is representable as an expected utility maximizer, and if that agent satisfies x and y and z, then they're liable to pursue dominated strategies. it's like, expected use

Divia (08:14)

Which is the way the VNM theory that is actually formulated, right?

Elliott Thornley (08:18)

Not quite. So, Von Neumann's Morgenstern Theorem, and this is important, makes sort of no reference to money pumps or exploitation or Dutch booking or anything like that.

Divia (08:20)

Right? Okay.

I thought it doesn't have premises about like independence and like that sort of thing.

Elliott Thornley (08:35)

Yeah, so it's purely a representation theorem. So there are four von Neumann -Morgenstern axioms, independence, continuity, completeness, and transitivity. And the theorem says that an agent is representable as an expected utility maximizer if and only if it satisfies the four axioms. And then I guess like, so you're like kind of right because often people go on to defend

one or more of the axioms by means of like money pump arguments, sort of like motivating saying, you must satisfy independence, otherwise you're liable to pursue dominated strategies. So it's like money pump arguments plus the VNM theorem can sort of make a coherence theorem or a coherence argument. Yeah, but then my point in the post is like the money pump arguments part of that equation.

Divia (09:07)

Thank

Mm

Got it.

Elliott Thornley (09:32)

rely on these premises that are sort of contentious in the normative case. And I think like very likely false in the descriptive case, the case that we're interested in, when we're talking about what artificial advanced artificial agents will be like.

Divia (09:47)

All right, thanks. And just to check, Daniel, did you read this post when it came out?

Daniel (09:50)

I think I got... So, okay, I'm going to be real. This post sort of triggered me. I read the title and I was like, I think I've seen some coherence theorems. And then I think the way you put it in the post is like, you know, a coherence argument, has to have like no substantive assumptions. Sorry, a coherence theorem has to have no substantive assumptions. And I'm like, well, that's not theorem. The way I read that, I was like, well, theorems always...

or the form A implies B or whatever, you assume A and you get B, and that's what proves A implies B. And so then I gave up on your post, which I think it was wrong. So I think like, I did read it later. I actually have not. It's been a little, it's been some time since I've read it, I'm gonna be honest. But the, yeah, maybe to say a little bit more. I think like when I originally read,

Divia (10:39)

Yeah, it is.

Daniel (10:48)

something like the introduction or first few paragraphs of Eliot's post. Like I thought it was going to be something like, you have to have some minor assumptions in order to prove these like coherence arguments. You know, like you have to assume something like, you live in a world where like you have choices or whatever. And I was going to be like, well, okay, like whatever. I'm those are good assumptions to make. So it's fine to make those assumptions and they still deserve name coherence theorems.

But I think the actual state of play is there are these von Neumann Mortgage Trader axioms, right? There are like these claims about what your preferences should be like. And the von Neumann Mortgage Trader theorem says that if your preferences are like that, then you're an expected utility maximizer. And the state of play is that if you assume like one of those von Neumann Mortgage Trader arguments, one of those axioms rather, then you can make these money pump arguments for the other axioms, right?

If you have this axiom, then if you don't obey this axiom, you're to shoot yourself in the foot or whatever. But you actually can't get there from zero of the axioms about your preferences. And so basically like,

Divia (12:00)

And sorry, for people who can't see the video, Ellie, you're nodding, right? That seems about right to you. Okay.

Elliott Thornley (12:03)

Yeah, yeah,

Daniel (12:06)

Yeah, and so the thing that seems to be true that I did not realize for a while, even after Elliot wrote his post, that the actual, the arguments, the things you need to rely on to get expected utility maximization, or the things you need to rely on to say that if you're not an expected utility maximizer, then you shoot yourself in the foot, they're just like seed -affirmatively more substantive assumptions than you might have guessed just based on the zeitgeist or just based on the way people

at least talked about these things one year ago. Yeah.

Elliott Thornley (12:40)

Yeah, I think that's great way to put it.

Divia (12:42)

Yeah, thanks. Okay, and now just to talk about another thing, we touched on this a little before we started recording, but Elliot, what was your motivation for writing the post? There's sort of, I think you say a bit about it in the post about why it seemed important, but can you lay that out for our listeners?

Elliott Thornley (13:01)

Yeah, yeah, so part of my motivation for writing the post was to sort of point out this thing that seemed like a mistake to me and a mistake about something important. Yeah, so in particular, coherence arguments appeared in this big Katya Grace post, I think it's called like counter arguments to the basic AI ex -risk case. And it sort of place in that post made me think

Divia (13:27)

Bye.

Elliott Thornley (13:31)

that people are taking coherence arguments as at least like a moderately important part of the basic case for existential risk from AI. And so it seemed important to point out this weakness as I perceived it in these coherence arguments. I say only a moderately important part because I take the point of coherence arguments to be that showing that advanced artificial agents are going to be

sort of goal -directed in a sort of concerted and concerning way. And there are other reasons to expect that besides coherence arguments. So in particular, you might think that like AI labs are in fact going to train agents to be goal -directed in some concerting way because they'll be more valuable or they'll be better able to solve problems or do things in the real world. So like only moderately important for those reasons I think coherence arguments are.

But I think, as maybe we'll get into later, I think coherence arguments are very important for another reason, which is something like this. So, you know, we've got these alignment proposals, proposals for keeping agents aligned or shut downable or corrigible or something like that. Some of them, including my own, rely on being able to create an advanced artificial agent that's not representable as an expected utility maximizer. And if you sort of...

bind coherence arguments, might think these kinds of alignment or shutdown ability proposals can't even get off the ground because as soon as your agent is sort of reflective enough to realize that it's vulnerable to pursuing all these dominated strategies, and in fact, you know, the claim is that it is vulnerable to pursuing all these dominated strategies, then it's sort of going to turn itself into an expected utility maximizer, and you're going to lose the property that kept it aligned or shutdownable. So it's like that second thing that

See, it's the main importance of pushing back on these coherence arguments, which is like making space for these proposals that rely on agents not being expected utility maximizers.

Divia (15:40)

Cool, yeah, so I think, let me try to summarize something about why this matters as I understand it. So one thing is, and I'm actually, I did look into this a little. didn't find sort of, like Daniel, you were talking about the zeitgeist. Like I think that there's sort of an impression of, yeah, these coherence theorems, of course it'll have a utility function. Maybe more than people have actually really said that anywhere.

precisely or maybe they have said it and I couldn't find it. So I'm a little bit confused about that point. But I think it's true that a lot of people, I think I had the impression like, yeah, at least people seem to think that it's gonna be some sort of expected utility maximizer. Yeah.

Daniel (16:21)

Yeah, I can maybe I say something about this. was like, so the reason that I sort of got, I don't know, developed more thoughts about this topic is that in the second half of last year, I, well, okay, first what happened is I read this comment on less wrong by someone who was complaining about like how people think that AIs are going to be expected utility maximizers. And I was like, guys, we've like proved, you know, there are proofs like in places and I'm just, I'm going to give a talk and I'm just going to like.

Divia (16:45)

Right.

Daniel (16:49)

collect all the proofs, I'm just going to make a really solid argument, and then that person will have to shut up from now on. And basically, yeah, I mean, I was under the impression that this would be really solid. And I still think you don't need that much, like the assumptions you need to get expected utility maximization.

On the one hand, they're not huge, but you only get like, you get a surprisingly weak form of expected utility maximization. And like, in order for me to make the argument that I wanted to make properly, I ended up like basically hypothesizing that, you know, this agent is going to like, try and narrow, try and like ensure that the state of the world and the future is in some like small set, which might, which you might think is the kind of assumption that you wanted to like, prove instead of the kind of thing you wanted to prove instead of assume.

But anyway, point being, I definitely assumed that the arguments here were better than they turned out to actually be.

Divia (17:51)

Yeah. Okay, so that's one thing. It seems like a bunch of people, regardless of what exactly anyone said, which I'm not sure about, a bunch of people assume that they're really solid, like, then we proved it with math type arguments for that the AI is going to be expected utility maximizer and that at least the three of us seem to agree, not true, right? And I read the comments and my impression from the comments is also like, think nobody has any real counter to that thing. And yeah, and this matters. I mean, I think it sort of matters for its own sake, part of what it means.

I know if you, I certainly identify as a rationalist. think Daniel does, Elliot, I don't know if you do, but that we care about what's true just for its own sake. And yeah, in so far as it's something that people say as part of the AI risk argument, that matters. Though again, I think all three of us are here are like, okay, but we're not trying to say like, look, we disproved it, AI is not risky. Like I think we all think probably still is risky and.

At least I will say the basic argument of like, you make something smarter than people and you have all these companies that are like, I'm going to try to connect it to the internet and have them do as powerful things as possible. I don't know. It seems kind of dangerous. I think that argument for sure is still, is still there. But then also what you're saying, Elliot is look, but if we are going to try to actually make proposals for making AI aligned and making AI safe and try to game out how this will work, then, then no, really does matter to try to pin down what we do and do not know.

about AIs so that we can have proposals that, like we can evaluate which proposals seem promising or not. Is that, is that everything I said seemed right?

Elliott Thornley (19:24)

Yeah, yeah, that's right. Yeah, so at least in so far as you mistakenly think that these coherence arguments are rock solid, you're sort of from the from the very outset ruling out this whole class of proposals, which might seem promising. And in fact, after writing this post, I sort of looked into incomplete preferences in more detail and thought that, yeah, proposal along these lines does seem promising. So in particular, like,

You don't really need to get into the whole thing here, but one reason you might think that incomplete preferences are promising is an agent with incomplete preferences can just sort of be more chill than an expected utility maximizer. So in particular, can like lack a preference between many more pairs of options. And you might think that we want this in so far as we want our artificial agents to be more sort of like chill and lacking preferences.

it's useful because like, if you lack a preference between A and B, very likely you're not going to like pay costs to shift probability mass between A and B. you know, two identical cans of, Coca -Cola, you lack a preference between them. And so you don't like, pay a dollar to get the right one with probability 0 .9 rather than like get the left one, the probability 0 .9. And so, you know, if we can create these agents with

Daniel (20:23)

Maybe.

Elliott Thornley (20:51)

incomplete preferences, we can create them such that there are sort of many more things such that they're unwilling to pay costs to shift probability mass between those things. And so sort of like, you get a sort of more chill artificial agent in this respect.

Divia (21:07)

And am I right that this, part about the agents with incomplete preferences and that motivation, that was not in the original post, right?

Elliott Thornley (21:14)

no, this is all like later thinking.

Divia (21:17)

Cool, all right. Yeah, because I do think some people, like maybe Daniel, what you're saying, saw it as more like, you're trying to nitpick and why does it even matter? Like, I think this is not a very virtuous way to, sorry Daniel, as far as I'm you, it doesn't seem like a totally virtuous way to read it. Like, well, why does it matter? Because the basic conclusion is probably right. And I think it's fine in the sense like everybody has limited time and.

Daniel (21:31)

Yeah.

Divia (21:43)

and whatever, but in terms of like actually responding to it substantively, I don't know, maybe this is just a little of my agenda. I'm like, yeah, often things matter for reasons that people hadn't even thought of at the time. And your thing about incomplete preferences seems like one of them, maybe.

Elliott Thornley (21:58)

Yeah, that's right. Although I do want to point out something that Daniel said earlier, which I think is also right, which is insofar as you're relying on these money pump arguments for the von Neumann -Morgenstern axioms and taking those two things together as like your coherence argument, it's important to note that at least the of the sort of most up -to -date, most sophisticated, weakest assumptions, money pump arguments

Divia (22:18)

Mm

Elliott Thornley (22:26)

all get found in this book by Johan Gustafsson from 2022 called Money Pump Arguments with Daniel Hens. Yeah, and the way that book is structured, he presents the money pump argument for completeness first, and then uses completeness to create the money pump for transitivity, uses completeness and transitivity, create the money pump for independence, and then uses all three, I think, maybe, maybe just like one of them to create the money pump for continuity.

Divia (22:30)

Good. That was the book here. Nice.

Elliott Thornley (22:55)

And so in that respect, like the money pumps are kind of like a house of cards where if you don't have the completeness money pump, if you're not compelled by that one, then it's hard to like get the other money pumps going as well. And this might be another reason to think that these coherence arguments are important besides like making space for these alignment proposals that depend on non -expected utility maximizers.

Divia (23:19)

Also, are you familiar with Scott Garment's work, like his geometric rationality sequence?

Elliott Thornley (23:25)

No, I haven't read that unfortunately. It's on my list.

Divia (23:26)

Okay, well, we'll leave that aside. Maybe for a different day though, as like a footnote, I found his post where he gave an example for why he's not compelled by independence to be, it certainly stuck with me and I was like, yeah, okay. And so before I had engaged with what you wrote, when people were always like, okay, but what about V I was always like, well, I'm not persuaded on the independence point. But yeah, anyway, I'll leave that aside. Okay, so yeah.

I think I'm hoping at this point for Danielle, you and Elliot to talk a little more about what in practice, what you do expect and why in terms of which in terms of the expected utility maximization stuff.

Daniel (24:10)

Yeah, it's a little bit hard for me to say this because if you take... There's this question, what does it mean to be an expected utility maximizer? Literally, what are we saying when we say that? And it means something like, for each state of the world or for each way things could be, there's some utility value attached to that. And for each thing you could do, you consider... Or there's some probabilities of...

various outcomes, and you basically reliably pick the thing that maximizes the expected value of the utility. So the expected value being like sort of probability weighted average, right? I think there are various versions of this. Like, do you have to be like thinking about the probabilities in your head or whatever? I'm a bit less concerned about that. But like one thing about this is that

The thing we're assigning utilities to is, I don't know, maybe something like states of the world or something like that. And if you're allowed to be like very, very fine grained about what counts as a state of the world, you can really just, like expected utility theory can be super expressive, right? Like if you're allowed to distinguish between tons and tons and tons of states of the world, then maybe like expected utility theory constrains you very little because you just have tons and tons and tons of utility functions that you can optimize the expectation of.

So all of that is to say, like, what do I actually expect with maximizing expected utility? I'm like, I think...

Divia (25:43)

Wait, so I can summarize that in case people didn't Maybe there's some sense that people have before they really think about it that, if you have a utility function, then you must be sort of like an act utilitarian where you're, I don't know, trying to do the good for the greatest number or at least for yourself in some kind of pretty naive seeming way. And you're like, no, it really doesn't say that at all. You could, you could formalize almost anything this way.

Daniel (25:46)

Hmm, sure.

Yeah, I think that's right. if you... Yeah, not quite everything.

Divia (26:15)

You could be like, assign super high. Yeah, but like I could assign super high utility to having this first and then that later for these reasons. And I could super low utility to having what seems like the same thing, but at a different time for different reasons. Like that type of thing.

Daniel (26:22)

Yeah.

Yeah, think the one thing that it does rule out is you've expected utility theory, if you're justifying it by money pump arguments, it does say something like, for some notion of resources in the world, it does say something like you don't give up resources for literally nothing. So you don't choose to have just less rather than more, but with literally all else held equal. But like,

What counts as all else held equal? Well, it sort of depends how strictly you want to model it. Yeah, or like, I don't know. You look at the phonon and mortgagetern axioms, like completeness, transitivity, independence, and continuity. And they're very, like they look really minimal. They look like they're not assuming all that much. And then if you read the phonon and mortgagetern theorem, you might get the sense that like,

Divia (27:23)

You say what they are, again, for people's benefit.

Daniel (27:26)

Yeah, so, yeah, let's go through them. So completeness says.

Divia (27:31)

And Elliot, I would say feel free to weigh in on these as you wish.

Elliott Thornley (27:34)

Okay.

Daniel (27:35)

Yeah. So let's start with completeness because it sounds like it's basically nothing, as I guess Elliot has written about, it's actually kind of non -trivial. So completeness says that for any two options, A and B, you either prefer A to B, or you prefer B to A, or you're indifferent between A and B, right? And so you might be thinking, wait, how is that even an assumption? Isn't that just like, isn't indifferent

doesn't indifferent just mean you don't prefer one to the other? Yeah, but it's not, but it's not because the way that you want to, or at least for the purpose of money pump arguments, guess, the way that you want to define indifferent is that if you're indifferent between like A and B and you make A slightly better, then you prefer A. Or if you make A slightly worse, then you prefer B.

Divia (28:08)

Right, so it seems almost like a tautology at first.

Daniel (28:34)

So here's an example of a way you could fail to satisfy completeness. Suppose you're asking yourself, should I become a doctor or should I become a monk? And you're like, man, I have really no idea. I just have no concrete idea of which one of those I should do. And then suppose I tell you, actually, when you were thinking about becoming a doctor versus becoming a monk,

You're using slightly out of date numbers for doctor's salaries and the salaries of doctors are actually like 3 % higher than you realized.

Divia (29:10)

Or assuming you're like, okay, well then, so if I give you this dollar to become a monk, then you will, and you'd have to see this. Yeah. Right, which it does not, that's not really how people, like that's not really how human beings behave, right?

Daniel (29:14)

Yeah, yeah, yeah. So that would...

that...

Divia (29:28)

I guess there's a question about that.

Daniel (29:28)

that's probably, yeah, there's some, it seems like it's not how human beings behave. So that's completeness. Transitivity says that if you prefer A to B and if you prefer B to C, then you also prefer A to C.

Independence basically says that suppose you prefer A to B, right? Like if you could choose between A and B, pick A. Independence says suppose that you've got some probability of C and some probability of A. That's one option. Or you can get the same probability of C and some probability of B, right? maybe like, basically it says you prefer maybe C, maybe A.

to maybe C, maybe B. So one example of this is like, suppose that you prefer eating pizza tonight than eating Thai tonight. And then like someone like basically says, hey, I'm do a coin flip, right? If the coin comes up heads, then you're definitely getting Mexican. If it comes up tails, then like maybe you're gonna get pizza and maybe you're gonna get a...

Thai, whatever the other one I said was. Yeah. But like right now you've got to decide like what's going to happen if the coin comes up tails, right? And independence says that like, if you prefer pizza to Thai, then right now you've got to say that like, if the coin comes up tails, you want the pizza world rather than the Thai world. So that's independence.

Divia (30:44)

tired.

Which through the record, I think this is often actually not true for sort of fairness reasons, given that agents I think are often best modeled as having something like internal conflict or like internally at least different preferences, which was basically the argument as I understand it that Scott Garibrand made in his post in the geometric rationality sequence.

Daniel (31:25)

Yeah, a lot of, you're not the only one to not, yeah, a lot of people don't like independence. I think it's actually pretty good, but people disagree about this. And then finally there's continuity. And continuity basically says that like small enough probabilities don't matter. So basically all these axioms are about like, you know, what you choose when your options are like having certain probabilities over just different outcomes, right? And

I forgot what continuity actually says, but it roughly says that like, if I prefer like this, you know, if I prefer this doing this thing, which has like some probabilities over various outcomes versus this other thing with some probabilities over various outcomes, there's some like tiny amount by which I can change the probabilities so that my preference remains the same. So if I'm like, I suppose someone offers me a gamble and I want to take that gamble over doing nothing.

If somebody says like, the probabilities are actually 0 .001 % different than what I said, then I'm still happy to take the gamble.

Elliott Thornley (32:32)

Yeah, think the continuity in the VNM theorem is slightly different, or at least like it's expressed slightly differently. basically, it goes, if you prefer A to B to C, then there's some combination of A and C that you prefer to be, and some combination of A and C that you dis -prefer to be. So like, yeah.

Divia (32:32)

Thanks. Appreciate it.

Daniel (32:41)

okay.

right.

have the impression that those are like, that you could do one of those assumptions or the other, like there are a different versions of continuity that can get you the same results. Is an impression.

Elliott Thornley (33:07)

Yeah, I wouldn't be surprised if they're equivalent, actually. yeah, at least in the Gustafsson book and in the V theorem, that's how it's expressed.

Daniel (33:18)

Anyway, so that's what those assumptions are, right? And, well, if you thought those assumptions were uncontroversial or something, you might think that like, the VNM theorem is like saying that if I, if I obey those assumptions, then there's actually a whole bunch of other things I have to do with my behavior or whatever to like, you know, to be consistent or whatever. But actually, like, as Elliot said earlier, the VNM theorem is a representation theorem. It says that like, if you satisfy those assumptions, then

there's some utility function for which you're maximizing it. And so it's, in some sense, utility maximization is just as weak as those four assumptions. And those four assumptions, don't tell you that much. Anyway, all this is a tangent from what do I expect in terms of utility maximization. And I'm like, yeah, there's probably going to be some sense in which AIs are going to be utility maximizers, because I think, probably because I coherence arguments are kind of good.

and partly they're persuasive to me. think that like the assumptions are stronger than one might think, but they're still like relatively, I think they're weak enough that I think they're basically right.

Divia (34:20)

Like they're persuasive, like they're...

Daniel (34:35)

Yeah, but I don't know. I'm sort of in the place where like, will AIs be expected utility maximizers? What do I expect with about that? I'm like, it feels like a weird question to ask to me.

Elliott Thornley (34:49)

Yeah, yeah, I think one thing that's worth emphasizing and that Daniel and Divya, you talked about earlier as well, which is like, if you sort of place no restriction on how you innovate, individually outcomes, no restriction on the objects of preference, that any behavior can be rationalized as expected utility maximization. So in particular, like the classic example of

Divia (35:13)

Maybe there's some trivial sense in which I'm like, the exact thing I did is worth a lot and everything else in my utility fraction is worth zero. Is that one way you can do it?

Elliott Thornley (35:24)

Yeah, that's exactly right. So yeah, if you want to get some like non -trivial prediction out of coherence arguments, then you need to like place some restriction, at least probabilistic on like how you individuate outcomes. You've got to say like, no, this artificial agent doesn't just care about like the entire history of the universe and there's sort of no more structure to their preferences than this like ordering over histories of the universe. Actually they care about

ice cream flavors or slices of pizza or something like that. And they're indifferent between any two histories of the universe that are the same with respect to slices of pizza. And if you sort of make that restriction, then you can get some predictions out of coherence arguments. Because then you can sort of think, okay, well, you the agent is trading sausage pizza for pepperoni and then they're pepperoni for mushroom and then they're trading

mushroom for sausage, and they're sort of paying to choose in this cycle. And they're doing this dominated strategy. And so now they're going to like revise their preferences and remove the cycle and things like that.

Divia (36:31)

and then they get money pumped.

Yeah, so can one of you describe like, do you think people, because again, there are these money pump arguments. So how far, or like, what do think the practical implications of the things people say about money pumping are? Or are there none? I think there's some, right?

Daniel (36:57)

mean, you utility theory is a really nice language. Like if you, if you give yourself utility functions and if you, if you're allowed to just say, yeah, my agent is going to be an expected utility maximizer and the utilities are going to be over states as defined in this way, then it becomes very, it becomes nicer to approve a variety of theorems. Right. So like, like utility functions, their functions over things, they're continuous. can like optimize things with respect to those functions. You can like vary the functions smoothly to

Divia (37:01)

Mm

Daniel (37:27)

I don't know. I mean, one -up shot is just a very nice modeling language. In terms of actual substantive risk arguments from modeling things as expected utility optimizers, I'm honestly not aware. Maybe Elliot has some in mind. I guess there's things like shutdown ability under some assumptions of

Yeah, maybe Elliot should go here.

Divia (37:56)

Okay, no, but there's something else that I'm trying to ask first, though I do want to get there that's like, okay, I, as a person, I definitely have some intuition of like, okay, yeah, I don't want to be that thing. Like it's sort of incoherent for me to be that thing that has those cyclical preferences about the pizza toppings because I don't want to lose all my money. And so there's some sort of intuition there that I think people tend to then expand. And I probably historically have sort of expanded it and I could try to speak to it, but I'm wondering like,

Do you guys know what I'm talking about with this sort of expanded intuition? Like, no, but surely I've got to kind of be like this or else.

Elliott Thornley (38:33)

Yeah, this sounds right to me. In particular, think it was the Eliezer Yudkowski post that I read. think it's called Coherent Decisions Imply Consistent Utilities or something like that, where it does this kind of thing. You give the example of the person that fails to be an expected utility maximizer by having cyclic preferences. They prefer A to B, B to C, C to A. This is really sort of like the

classic money pump, the money pump that really works if any of them do, because then you can sort of say, all right, pay me $1 and I'll switch you from A to B, pay me $1 and I'll switch you from B to C, pay me $1 and switch you from C to A. Yeah, so I think that money pump kind of basically just works. There are some technicalities, but like, it's pretty convincing to me. And so I think you shouldn't have cyclic preferences. I think the error is in extrapolating from that to the whole like

expected utility maximization, all the von Neumann, Morgenstern axioms. Because the money pump for completeness, I think in particular, is not nearly as convincing as the money pump for acyclicity, the one that says you shouldn't have cyclic preferences.

Daniel (39:46)

Yeah.

Divia (39:46)

Can you lay that out? what is, I don't think I actually know what the money pump is for completeness.

Elliott Thornley (39:52)

Yeah, okay, so I'll give the sort of non -forcing version first and then talk about the second one that Gustafsson talks about, which is supposed to be the forcing version. recall that completeness or having incomplete preferences is about having preferences that are like insensitive to some sweetening or souring, in particular, lacks of preference that are insensitive to some sweetening or souring.

Divia (40:20)

So like, I'm, yeah, I said, don't know if I want to be a doctor or a monk and then you offered to pay me a dollar to be a doctor. I'm like, yeah, I still don't know. Actually. I refuse to. Yeah.

Elliott Thornley (40:28)

Yeah, exactly. And the money pump for completeness, the non -forcing one that we'll talk about first, sort of uses this fact. So the first choice is between a doctor or monk, and you're stipulated to lack of preference. So you can choose monk at this point. And then at some later period of time, you know that you'll be offered the choice between sticking as a monk or switching to a slightly worse paid doctor than you were before, the one that you, the first choice that you had.

And then if you have incomplete preferences and in particular you lack preferences between both careers as a doctor and the careers of monk, but you prefer to be a better paid doctor to a lower paid doctor, then it seems like you could be money pumped in this situation because what you could do is you could decide to choose monk at node one instead of higher paid doctor and then later change your mind and choose

lower paid doctor at node two instead of sticking with the monk. And then you'd be money pumped in that case. And this is like.

Divia (41:30)

Okay.

Well, is it like infinitely because people could keep doing this? I get it. From my perspective, money problems don't seem that compelling unless it's infinite.

Elliott Thornley (41:44)

Yeah, good. So this is one thing to consider, which is like, yeah, the money pump for completeness is less compelling than the one for acyclicity exactly for this reason, because you couldn't sort of extract infinite money out of someone. Or at least like from the bare fact that they have incomplete preferences, you couldn't extract infinite money out of them. If they had like extremely incomplete preferences, maybe you still could extract a lot.

Divia (41:56)

Okay.

Sure, but if people are like, okay, we're gonna keep switching you from monk to even worse paid doctor, then I would think at a certain point, I'd be like, okay, well now that you're asking me to pay to become a doctor, like now I'm out. So you can't keep doing this, right?

Elliott Thornley (42:20)

Yeah, yeah, that's right. Yeah, but okay, so the main way in which I think this money pump for completeness isn't particularly compelling is it relies on this premise that Johann Gustafsson calls decision tree separability. And decision tree separability basically says that you can ignore parts of the decision tree that are no longer accessible. So in particular, like

If xNihilo you had the choice between monk and lower paid doctor, you'd lack a preference and so would maybe choose like each with some positive probability. And so since you'd do that thing xNihilo, you'll also do that thing if you previously turned down the option to be a better paid doctor.

Divia (43:12)

I see. But you're saying, no, you can just not do that. You can be like, well, I turned it down before, so I'm going to sort of stick with that.

Elliott Thornley (43:19)

Yeah, yeah. So this is the proposal for like, how you avoid this money pump for completeness is by denying decision tree separability. And actually, okay, so the sort of complication is that Gustafson is arguing about whether being representable as an expected utility maximizer is rationally required. Whereas in a, say again,

Divia (43:40)

What is it?

What does that mean, if it's rationally required?

Elliott Thornley (43:45)

Yeah, it's kind of like, it's, we're getting into like normativity and stuff. It's like a requirement of rationality, you like prudentially ought to be a von Neumann Morgenstern accent.

Divia (43:56)

Just like, prudentially, I ought to not have cyclic preferences, we might say. Is it in that sort of sense? Like, is that sort of how you mean it?

Elliott Thornley (44:00)

Yeah, yeah, it's like, I'm gonna go now. I will say that again.

Yeah, it's like arguments about what we prudentially ought to do or be. It's like rationality requires that you are.

Divia (44:13)

Okay. And so this is not like a mathematically rigorous thing. Or is it?

Elliott Thornley (44:18)

Well, it's kind of like, it's a way of interpreting this claim of decision tree separability, which might affect how compelling you find it. You know, we come into theorizing with some intuitions about what rationality requires. And maybe we think that rationality requires that we satisfy decision tree separability, that like, our decision shouldn't depend on

parts of the decision tree that we can no longer access.

Divia (44:49)

which some people have an intuition about that or some sort of moral impression about that.

Elliott Thornley (44:54)

Yeah, I think that's basically what it comes down to. And my point that I make in the post is like, know, decision tree separability interpreted as a claim about rationality is somewhat contentious. But when we're thinking about how advanced artificial agents will in fact behave, we need this like analog of the premise, which is instead claims about how artificial agents will in fact behave, namely that they will in fact

Daniel (44:58)

Yeah.

Elliott Thornley (45:24)

ignore parts of the decision tree to which they no longer have access and behave the same no matter what happened in the past. And this claim is very easy to doubt, right? So if you create an artificial agent that in fact modulates its behavior depending on what happened in the past, then you can falsify this claim. And so you could quite easily create this artificial agent that failed to satisfy decision tree separability and thereby avoid the money pump for completeness.

Divia (45:34)

Yeah.

Elliott Thornley (45:54)

yep.

Daniel (45:55)

Can I maybe use this as a jumping off point to say a slightly weird thing about this literature, which is that you might have thought that money pump arguments, or you might have thought that the way you should talk about for Norman rationality or whatever, is some theorem like, if you don't do this, then a bad thing will actually happen to you. And maybe you might think that the theorems would be something about what kinds of things agents actually do.

And if you don't actually do this thing, then a bad thing will in fact happen to you. And like, I think some arguments are kind of like this. And then some arguments are basically talking about, you know, the object of von Neumann rationality is just like your preferences, which are just things inside your head about like how you rank various outcomes that might not even be things that you actually do. And then the theorems are like, well, if you, you know, if your preferences inside your head are arranged a certain way.

then some preferences will depend on other things. But that's a crazy way for the inside of your head to be arranged. And that's like really bad. And, yeah, they'll use these words that send you to dictionary, like cognitive attitudes, which is a word for like cognitive, which I had to look up. means like relating to your desires, I believe. Yeah, so C -O -N.

Divia (47:12)

So, what attitudes?

We'll work on that later.

Hm. Can you spell that? Sorry. Yeah. cognitive. OK. Interesting.

Daniel (47:22)

A -T -I -V -E, if I recall correctly. Anyway, and so like, from my point of view, if I'm trying to think about AIs, I sort of want to think like, I want to say something like, if they aren't, or I don't know if I would like to say this, but the kind of theorem that I'm interested in hearing is if they don't actually do expected utility maximizing behavior, then a bad thing will happen to them and I'll be able to take all of their stuff and, you know.

Divia (47:51)

Right.

Daniel (47:52)

Whereas theorems that are like, if you, you know, that's like a bad way for the inside of your head to be, I'm sort of like, well, does a bad thing actually happen? And it's relevant because like, so the thing about bad things for the insides of your head to be, like, I'm saying that in a sort of dismissive way, but I think some people are tempted to cite these money pump arguments to say like, if my preferences

you know, if like the preferences I have represented in the insides of my head are a certain way, I will realize that that's a bad way for them to be. And then I will choose to arrange them in a slightly different way. And statements like that are about like the insides of your head and what you'll be tempted to do. And so in those settings, you're almost tempted to take the versions of the theorems that say like, yeah, if the insides of your heads have to be like this, otherwise it's bad for some reason.

But like, then you have to really make sure you believe the assumptions about like, hey, why would it be bad for some reason for the insides of your heads to be another way? Like, if you want to get a non -trivial assumption about like, if the insides of your head are one way, you will realize that it's bad and change them to be another way. Like, well, the bit where you realize that it's bad, like that's a non -trivial assumption about what the inside of your head is going to be. like, who knows?

Divia (49:17)

Yeah, I'm like, don't, in this moment, certainly, I don't feel very compelled by what, like, why would that be bad? Which I guess is what you're saying.

Daniel (49:23)

Yeah, well, it's the thing about the insights of your heads maybe being bad is just like a surprising amount of the literature is focused on this question. Surprising to me.

Divia (49:35)

And the bad is some sort of intuition that, for example, it's bad to have what you're going to choose now be affected by your memory of what you didn't choose before. That's an example of badness.

Daniel (49:48)

Maybe that kind of thing, maybe Elliot can go into this further. But yeah.

Divia (49:52)

Elliot's nodding. You think maybe it is?

Elliott Thornley (49:54)

Yeah, yeah. So Gustafsson, like, yeah, he gives these arguments and, you know, they're very inventive and I think like fairly compelling when we're interpreting them in terms of rational requirements, where like, the point is that when we're interpreting them as in terms of rational requirements, we can sort of appeal to intuitions about

rationality and what you prudentially ought to do and then maybe to some extent these premises about like, badness in the head become more compelling. But in the context of coherence arguments, I totally agree that they kind of lose all force and it's kind of disappointing to realize that these arguments that seem like they're just relying on math alone are actually depending at root on premises like

artificial agents won't in fact modulate their behavior, depending on what happened in the past or like artificial agents will in fact be indifferent between B when they could have had a and B when they couldn't have had a or something like that. You really like need these things and it's kind of disappointing that you do.

Divia (51:08)

This is.

Yeah, it's interesting. It's sort of, I don't know, thanks for talking this through because now it sort of reminds me a little bit, a lot actually of like, sort of how I used the word rationality before I encountered the rationalist community. But like colloquially people will be like, well it wouldn't be rational if whatever. Where like now that I've engaged so much with, I don't know, like, you know, I stick around the sequences, I hang out with rationalists for like.

God, over a decade or whatever. And if anything now in those sort of colloquial situations where they're like, well, that wouldn't be rational, I'm always like, no, sure, sure, could be like almost anything could be. I feel like saying that you're going to be rational actually rules out a lot less than how people colloquially use it. what am I trying to say precisely? Like, I guess it's an analogy where in this case you're talking about the math, but with me and the rationality community, I'm more just talking about like,

the way my vibes about it sort of converge over like listening to a lot of arguments and hanging out with a lot of people and stuff like that. Like maybe a classic example is that's really the same thing is I think when I was a kid, if someone would be like, it's sort of irrational to like turn down something out of like, you don't, because it has, because out of like spite or something. And then I thought about it more and I'm like, no.

sorts of like that could totally make sense or like it wouldn't be rational to do something just because you didn't want other people to see that you've done that and now I'm like why not like that could be part of my goals too or I don't like things like that's I don't know if that resonates with anyone else

Elliott Thornley (52:55)

Yeah, yeah, absolutely. Yeah, I think part of it is when we talk about rationality, we're sometimes talking about this kind of purely instrumental rationality, where it's about sort of taking effective means to your ends. And sometimes we're talking about this more substantive conception where like, not only is it about taking effective means to your ends, it's partly about having the right ends. You know,

Divia (53:08)

Mm

And it's also partly about having the right ends, I think, in a way that's like legible to outside observers so that people can expect a certain sort of coordination that they think is normative. Is that, I don't know, that's the thing I think.

Elliott Thornley (53:19)

You

Yeah, that sounds fairly compelling to me.

Divia (53:33)

Like I read, this is maybe a bit of an aside, but the thing that I think about all the time, Kevin Simmler has an essay called, Personhood, a game for two or more players or something, something that looks very similar to that title, where he talks about like what it means to be recognized as a person in society. And a lot of what that means is to do things for reasons. And it means, I mean, of course it's a little bit contextual and nebulous and all of that, but like, I think a lot of what that means is things that other people recognize as reasons.

Like I've also spent a lot of time with children, because I have children and I've been doing that for them, especially young children. Often people are like, well, that's completely unreasonable. And I think what they mean is not that they don't have reasons, but that they don't have the sort of approved like, okay, if sort of adult persons have those sorts of reasons, then we can kind of expect society to be able to coordinate in these types of ways. Which is maybe a little bit of my soapbox, but I think it's relevant.

Elliott Thornley (54:26)

Mm

Daniel (54:30)

relevant. Can I maybe like say something in defense of the people who are like, who are telling you what you have to want and stuff? So, so I mean, this sort of gets to, I don't know, I think this is closely related to arguments about what counts as expected utility theory or whatever. And it gets back to this question of like, well, how do you like how do you motivate these money pump arguments? And like, the way you motivate the money pump arguments broadly is there's a thing called money, and you can have

Divia (54:38)

Yeah, yeah, go.

Daniel (54:59)

more of it or less of it. It's better to have more of it, right? And like, we actually do live. Okay, I'm about to say some things and they're like math adjacent, but like, don't know. I'm going to use mathy concepts, but like in a loose way. Okay. So

Divia (55:16)

Defense from the potential accusation that you're trying to pretend like something math is, math when it's not, no, it's not real math. Cool. We got it.

Daniel (55:19)

Yeah. Yeah. And it's good for people to know that. we live in a universe, right? And the universe has like, there's this thing called free energy, right? Like if I have

Divia (55:33)

Do you mean like in the first instance? I've just been trying to...

Daniel (55:35)

No, I mean in the physics -y, just like literal, like I can have a battery, right? And can put it in a machine and the machine will do a thing and the machine will heat up and then I like can't use that energy to do stuff anymore, right?

Divia (55:48)

Okay, true.

Daniel (55:50)

So like.

Like, I don't know, the universe is made of objects and stuff that could be moved around. if you are in control, in control, in scare quotes, because what does that even mean? But if you're in control of more resources, more energy to actually go about doing stuff, you can change the state of the universe more. Similarly, I don't know, if you really believe in probabilities or something, you could imagine the kind of agent

So one of the assumptions in the von Neumann -Morgenstern theorem is continuity. We talked a bit about it before. And basically, what it comes down to is you don't care about super tiny probability worlds. And like,

Divia (56:32)

I'm compelled by this. Maybe I shouldn't be, but it seems intuitive to me.

Daniel (56:35)

Yeah, I mean, it's intuitive to me, but like, imagine somebody who says like, no, I actually just do care about like incredibly tiny probability worlds. care about like probability of things being greater than zero. Like if something has probability greater than zero, that's like a billion times better to me than having probability zero. On the one hand, like, what can you really say to that person? But on the other hand, like in the world that we're actually going to live in with very high probability, they're not going to get their way and you're going to get your way if you get into some conflict. Right. So like, like if you, if you buy this

Divia (57:00)

Yeah, I'm compelled. Yeah.

Daniel (57:05)

picture of a world with just like physical constraints and like physical stuff that you can actually do if you're managing like free energy and if you're just like managing sort of the macro state of the world, not like the micro state of, you know, which atoms are exactly in what place.

Divia (57:18)

having some like one billionth of a percent of a probability that I get a lot of stuff. Yeah, okay.

Daniel (57:23)

Yeah, like, like, basically, there are certain types of preference structures, which mean that, like, in the actual physical world, you're not going to be reflected in it, you're not gonna like control that much of it, you're not gonna steer it. ones where you care about, like, are these two atoms swapped or not? Or the ones where you're like, it's rational for me to, to just delete all of my, like, like, suppose I have a hard

Divia (57:33)

But which ones? Which ones?

Daniel (57:52)

drive with some Bitcoin on it. And I'm like, okay, here's what I'm going to do. I'm going to delete just the private key for that. I'm going to like make it impossible for me to spend that Bitcoin. And this is actually better than not doing that because even though I have less resources, my computer is a little bit hotter and I typed out some commands and like, therefore it's rational for me to do that thing. And you can't tell me that that's irrational. And I'm like, well,

Divia (58:02)

Yeah.

Okay, I'm not gonna defend one.

Daniel (58:20)

Okay, I'm not saying you're gonna defend that one. I'm just like

Divia (58:22)

I don't think that's a good defense of the people that were trying to, and I don't even, I'm not trying to blame individual people, but some vibes I got from society about what and was not rational for me to want as a kid. I think it wasn't that type of stuff. I think it wasn't me being like, let me just delete my private key to my Bitcoin. I basically agree. I think there's some sort of sense that I agree with where I'm like, that seems like a terrible idea for almost everyone to

point where it seemed kind of rational when the person's going to go, like, yes, they could have their special snowflake preferences if I just get so much utility from deleting my private key. But for the most part, and I can think of edge cases where someone might want to, like real ones that I have thought about, but I think it's

Daniel (59:00)

Hmm. Yeah, I guess I'm like, I'm not exactly, I'm more doing something that's adjacent to defending the Divya bothers than like actually defending them. But like, I don't know, it feels relevant to me for understanding, like especially if we're trying to like form rigorous arguments about what AI's will do and people will be like, well, who used to say that this thing is irrational? Like, I do just want to say like, look, some preference, like some ways of behaving mean that you don't get reflected in the future very much.

Divia (59:29)

Yeah, I you can be like a causally relevant thing and some things mean you can't. Yeah, okay, I agree with that. No, okay, I think that's right. But no one's really formalized those ways necessarily that we know of.

Daniel (59:33)

Yeah, yeah. don't know. That's... Yeah.

Elliott Thornley (59:37)

Yeah.

Daniel (59:44)

I'm unaware of anyone doing that, maybe Elliot is.

Elliott Thornley (59:47)

Yeah, guess, yeah, so Daniel, what you say sounds right, but it's not so much a care adherence argument at that point is more like a selection argument or something like that.

Daniel (59:58)

Yeah, think, well, I think that like these money and pop arguments, you can like view them as coherence arguments or selection arguments. I feel kind of indifferent to which ones you use, but like, if you say like, the reason that traveling in cycles, like suppose I'm paying money to travel in a circle, is that like incoherent or is that selected against? Well, I'm sure it matters which one you say, but I'm sort of like...

It doesn't matter that much to me right now.

Divia (1:00:30)

Okay, sorry, I think I wanna make one thing, which is I think, and people make arguments that at a certain point, it's sort of natural for the AI to become a single set. I'm not currently like totally persuasive. Like sometimes these scenes seem a little persuasive to me, it doesn't seem obvious to me that that's what's gonna happen. So if I set that aside, I think...

Elliott Thornley (1:00:30)

Yeah

Divia (1:00:58)

So if I'm gonna fight with the like vibes I got from society when I was a kid about what rationality meant, I think actually there's a very natural set of strategies for an agent type thing to use in a situation where there are other agent type things that roughly look like, does it seem like some weird adversarial thing is going on? If so, do some stuff that might not otherwise be considered rational.

for good reason, because it seems like if I do that, then I'm gonna, partly because I'm predictable and partly because whatever, somebody is gonna do some weird thing to me and so I should not. And yeah, I think a lot of the things people were like, well, that wouldn't be rational, were in fact for good social reasons. And I think that is legitimate analogy to some sort of impression about like, okay, but is this thing gonna get like infinitely money pumped or is this thing gonna get whatever? And I think that's distinct from like, are you just sort of,

giving up all of your power to affect the universe in the future, which is like the deleting your private key thing. I don't know if I, obviously I feel very far from any ability to formalize any of this, but that's maybe where I would start is to try to separate out like, am I dealing with somebody trying to mess with me or am I doing some other activity that's like giving away all my power?

Daniel (1:02:21)

I've talked to -

Divia (1:02:21)

I know, Ellie, do you have thoughts on this latest round of back and forth?

Elliott Thornley (1:02:28)

Yeah, I don't know what you say sounds sounds compelling to me. I feel like I don't have much to add to that point.

Daniel (1:02:36)

Maybe it gets to this thing again of just like what counts as a thing that you might have preferences over, right? So like, if someone's allowed to expand, like, you know, the definition of what a thing is to include, like, it seems like someone is fucking with me, and I can have different utilities for like this situation, but someone is trying to mess with me than this situation, but it's not true that someone is trying to mess with me, then it's like a lot easier to recover expected utility theory. That's question of like, is that even a legit move?

Divia (1:02:42)

Mm

I intuit that it is because it's so much of what human beings are be up to.

Daniel (1:03:06)

Well, there's a legit move of behaving differently in those situations. And there's a question of, it a legit move to say, no, that counts as expected utility maximizing. And therefore, expected utility maximizing is rational, because this thing that you're doing that is rational counts as expected utility maximizing.

Divia (1:03:20)

I see.

Okay, there's a sentence, I probably should have it pulled up, but I don't. Daniel, I'm guessing you are, at least there's some document that Eliezer wrote about a cheerful price.

Daniel (1:03:38)

so I think, yes, I think someone else originally wrote a document about a happy price and Eliezer was like, no, it's going to be cheerful.

Divia (1:03:44)

Yeah, no, that's right. Yeah. Elliot, do you know this document?

Elliott Thornley (1:03:48)

Yeah, I think I read this.

Divia (1:03:50)

Okay, so there's something there where, and to me it gets at like the part of, like something that resonates with what you're saying, Daniel, about like, okay, but it is kind of irrational to elude your private key, where Eliezer says something, and maybe I'll get the exact quote in a minute, but first I'll say what I remember from it, which is, and look, if you're the sort of person where it feels terrible to exchange money just because that's sort of how you are, whatever, yes, I respect that footnote, but there is this thing where if you deeply understood the mathematical structure of the universe, then probably you wouldn't really be

Daniel (1:04:23)

I

I can scroll through, I don't know that quite off the top of my head.

Divia (1:04:35)

Okay, well, I think what I get from it is like, no, but if you really have thought about that resources are good for stuff and that money is a resource, that there may still be some like, ick factor of like, what does it mean to exchange money that is real and matters, but there will be this counter balancing thing that's like, okay, but something real is happening when we exchange money that really does.

matter and in ways that can be used for good things. Where someone could maybe be like, but if we do that, then it'll hurt. Like if we exchange money, then maybe I won't feel as close to you. And the inner person who has really thought about this then asks themselves some question that's like, okay, but is there some way I could use money then to translate into us feeling closer together that isn't dumb?

like maybe we spend it having some experience together or something like that. And like thinking about that sort of fungible, that some resources really are pretty fungible and trade -offs are a thing and there was never really an option of 0 % chance of that anyway. And like, I don't know, or the way that like people I think often can, and it interacts with sacred values and I'm not trying to dismiss that, but be pretty weird about people taking jobs, for example, that have like some.

Because they'll be like, well, I wouldn't do that if it would maybe kill me and be really sensitive to framings of like, is there some upside of like not dying or, and then in practice, I look at like some chart of wages and employment and it'll be like, yeah, people seem to be willing to trade off some chance of dying at their job for some higher income. And so maybe that does make it, even though it's really fraught, a little less ridiculous to be like, and that's, we're going to pick these sorts of numbers to value people's lives.

I don't know this is too much of a rant. There's something there that's resources are real and it's sort of related to like instrumental convergence is a thing and that that's sort of maybe it hasn't been formalized, but it matters for being an agent and acting in ways that make sense.

Daniel (1:06:23)

I agree with the rants.

Maybe it sort of connects to this question of proposed ways of acting that don't count as expected utility maximization, but involve not throwing away all your money. And to what extent are there such ways and how do we feel about them? I guess I broadly agree, but.

Maybe one thing to say is like, expected utility, like when people are looking at expected utility theory, they're, typically looking at a very, like, like a very sort of bare bones set of options and like, you know, you have A, B and C and like B is like A, but you got paid five cents and like, like, like, like these sorts of considerations of like, but, like, you can use the five cents to do something else, like tend not to come up or yeah, I don't know.

Maybe Elliot has more here.

Divia (1:07:33)

I found the quote. It's interesting. I actually didn't realize Eliezer was going to use the word coherence theorems in the quote. I'm going to read it. The question, but okay, but as you admit, some people, even most people would rather not put financial prices on things at all and their friendships, they'd rather just do things for favors without a blah, blah, blah. He says, I was speaking mostly tongue in cheek, but in fact, there are coherence theorems saying that you either have to have consistent quantitative trade -offs between the things you want or your strategies can be rearranged to get you strictly more of everything you want.

I think that truly understanding these theorems is not compatible with being horrified at the prospect of pricing one thing in terms of another thing. I think there is a true bit of mathematical enlightenment you get and see into the structure of choice -making, and then you are less horrified by the thought of pricing things in money.

And so I read that and I'm like, okay, maybe, I don't know about these coherence theorems, but I think he's right about a thing that is true that I predict exists and makes the people less horrified about it.

Elliott Thornley (1:08:30)

Yeah, I don't know. think it's complicated. I think insofar as your squeamishness about money is forcing you to have acyclic preferences, then if you think carefully about that, you'll probably want to change those preferences in some way and so resolve the acyclicity. There's another kind of coherency thing that Eliezer talks about in either this post or another one, which is about like,

putting inconsistent prices on things. And I agree that the consequence of that is getting at most as much of something as you want and strictly less of something else than you want. And so that's a reason like not to put inconsistent prices on things. But I think again, it's sort of like extrapolating too far from these cases where the coherence -ish arguments work to cases where they don't in fact work from like

acyclicity and inconsistent prices to all the other von Neumann -Morgenstern axioms and full -blown expected utility maximization.

Divia (1:09:38)

Okay, so and you're saying you would put inconsistent prices in sort of roughly the same category as having those cyclical preferences as like, yeah, you probably if you notice that normatively, you kind of think people should change them.

Elliott Thornley (1:09:52)

Yeah, I think so. Exactly. Because, yeah, the distribution you in fact get is going to be dominated by some other available distribution. And in fact, Carl Schulman and I make an argument like this in this Global Catastrophes paper that we published a couple years ago, where it's like, you know, the US government is willing to pay $10 million to save a life in expectation when it comes to road traffic regulations. And so they should also be willing to

pay that much in expectation to prevent people from dying in global catastrophes. Exactly because if you pay $10 million to save someone on the road, but don't pay $5 million to save someone from dying of a global catastrophe, there's some other distribution of money spent such that you pay no more and save more people or save the same number of people and pay less.

Divia (1:10:50)

And a global, and when you say global catastrophe, you mean like something that affects everyone. You're not necessarily, because I'm like, I think there's some argument that the US government wants to pay more to protect citizens, but you're like, no, no, these are, these are US citizens in both cases that you're comparing.

Elliott Thornley (1:11:03)

yeah, US citizens both cases.

Divia (1:11:06)

Cool. Okay. All right. So I do want to segue a little into, Elliot, so your proposals for alignment, can you lay out both about them in general and about what, like the, what you, how you think this, this agent could work and not be doing the stuff that really people shouldn't do, but not be obeying all those axioms and therefore you could shut them off or something. Can you explain how that would go?

Elliott Thornley (1:11:36)

Yeah, so I call it the incomplete preferences proposal and incomplete preferences is really the key thing. in a nutshell, the idea is you train your agent to lack a preference between every pair of different length trajectories, where different length trajectories are sort of like, you can think of them as different length lives of the agent. So, you know, if in one possible life, the agent lives 10 minutes and another possible life, they live 20 minutes, then they lack a preference between.

Daniel (1:11:47)

you

Elliott Thornley (1:12:06)

those two.

Divia (1:12:06)

assuming there aren't any other major differences between these things or even if there are.

Elliott Thornley (1:12:12)

yeah, no matter what. you know, if it's 10 minutes versus 20 minutes, the agent lacks a preference no matter what. And the reason why... yeah.

Divia (1:12:20)

So even it, and this, I should maybe just let you go with that. This is a maybe surprising thing. So like the one where it is on for 10 minutes and then it's off versus the one where it's on for 11 minutes and gets all of its wildest dreams fulfilled and then gets shut off a minute later, it's supposed to lack of preference.

Elliott Thornley (1:12:38)

Yeah, that's right. And I agree it is surprising, but okay. So the idea is that if you train this agent that lacks a preference between every pair of different length trajectories, then it's never going to like pay costs to shift probability mass between different trajectory lengths, different possible lengths of life. And this is sort of exactly the thing that you want to keep the agent shut downable to ensure that it never resists shut down. It's sort of never going to be willing to pay any kind of cost to resist shut down.

won't spend some seconds thinking about it, won't spend some jewels trying to make it Yeah.

Divia (1:13:12)

This is so.

And you think it would be tractable to train an agent that way and it wouldn't create any sort of situation where it kind of notices that and it's like, no, that's incoherent so I better not. You think it could work and it would not be flagged as incoherent by the agent.

Elliott Thornley (1:13:31)

Yeah, so this is the hope at least. So, you know, I pushed back on these coherence arguments to think that incomplete preferences can be reflectively coherent. You know, this agent could have these incomplete references and think, I'm okay, I'm not going to pursue a dominated strategy exactly because I violate this decision tree separability axiom that we talked about earlier. And I can like make choices that prevent me from pursuing dominated strategies, even though my preferences are incomplete.

Divia (1:14:02)

Okay, and then, but now I'm trying to compare that to, I don't know what I'm thinking of Daniel's proposal for like, okay, but there's some intuitive sense that you should want to be like a powerful type agent.

Elliott Thornley (1:14:15)

Mm

Daniel (1:14:15)

Yeah.

Divia (1:14:16)

something. I seem to violate that one, right?

Elliott Thornley (1:14:20)

Well, yeah, I don't know. It's kind of an open question. think.

Yeah, so the agent sort of lacks a preference between every pair of different length trajectories. If you think about like powerful humans are often going to have incomplete preferences and that seems to indicate at least to some extent that incomplete preferences don't preclude this power seeking. Yeah.

Daniel (1:14:48)

Can I give you my concern about this proposal and you can respond. So if I sort of anthropomorphize this, right? This sounds like me being like totally indifferent about how long my life is, right? And like a short life that's like really bad or something. I'm indifferent between that and a long life that's like really awesome. And I take control of every continent or something. And my worry about that is...

Elliott Thornley (1:14:53)

There we go.

Daniel (1:15:18)

Okay, firstly, that sounds crazy to me, which maybe this is something I've got to get over. But like, that's, that's just my intuitive reaction. If I'm like, why, what's bad about that? Well, one problem is,

Like there might be threats to the agent's longevity that aren't just like humans shutting it down. For instance, there are threats like, you know, there's just like some random problem and you know, the, there's an interruption to the power supply to its data center or like, you know, some, like we're in a world with like a few AIs and there's this nasty AI trying to shut off our nice AI and the nice AI doesn't like try to stop it or whatever. And I'm like,

Well, if your agent is vulnerable to those things, that seems like it would be bad because it can't do as much good stuff that you wanted out of an agent. So I don't know. That's my first pass concern. I'm wondering, what do think of that?

Elliott Thornley (1:16:18)

Yeah, yeah, on that concern, I think, you you train your agent to lack a preference between different length trajectories, where importantly, like different length trajectories are trajectories in which like the shutdown button is pressed after different lengths of time, say, and the shutdown button never being pressed is just one more possible trajectory length. And if those are your agent's preferences, then the agent could still prefer not to be like incapacitated in some way.

not mediated by the shutdown button, because it's like, you know, the agent has a choice between either being destroyed by the evil AI or not being destroyed by the evil AI. And suppose that the shutdown button is never going to get pressed. And no matter what, those are like same length trajectories in the relevant sense. If the good AI gets more of what it wants in the trajectory where it doesn't. yeah.

Divia (1:17:12)

So I don't think I follow that. You're not saying, one thing that you could have been saying is that it only sort of counts if the human presses the shutoff button for the thing you're talking about. But I don't think that's what you're saying. And then I don't understand it.

Elliott Thornley (1:17:28)

Yeah, so you want your different length trajectories, like you want your length of trajectory to be decided in a particular way. So maybe like one possible implementation is you have like some physical button somewhere that transmits some signal that tells the AI it's time to shut down or something like that. And it's sort of the time at which that signal is transmitted that the

AI.

Divia (1:17:58)

people couldn't then the evil AI just be like, okay, my new plan is to press that button. And so now it can't stop me.

Elliott Thornley (1:18:04)

Yeah, so this is good. This is exactly right. But I feel like this is a problem not to be solved by some clever shutdown ability proposal, but to be solved in like the usual way that we solve things. It's like my shutdown button, now I've got to protect it from the evil AI.

Divia (1:18:19)

Go.

it and put it in a very safe location and that's the plan for that or something like that.

Elliott Thornley (1:18:30)

Yeah.

Daniel (1:18:30)

So when you say my shutdown button, I've got to protect it from the evil AI, is the first person voice there? Is that supposed to be like what the human designer of the AI system is thinking, or is that supposed to be what the AI is thinking?

Elliott Thornley (1:18:43)

yeah, sorry, human designer. So the...

Daniel (1:18:46)

Okay.

Divia (1:18:47)

I mean, the human's allowed to ask the AI, how do I protect the shutdown button, presumably.

Elliott Thornley (1:18:52)

Yeah, I think so. Yeah. Yeah.

Daniel (1:18:54)

Yes, although I worry that in your proposal, the AI is going to be indifferent between telling the truth to the human and telling a lie. if the AI tells the truth to a human about how to protect the shutdown button, then probably that will mean that the shutdown button will be pressed at a different time than it would otherwise.

Divia (1:19:05)

No, but...

It changes the expected value of the trajectory and therefore it doesn't can't even have its usual preference for truth in that scenario. Yeah, I see.

Daniel (1:19:25)

If I understand correctly, but I might not.

Elliott Thornley (1:19:28)

Yeah, so Daniel, you're right. the AI doesn't have a reason to tell the truth purely from like wanting to extend its life. So it's going to depend on the AI's preferences between staying on the truth.

Divia (1:19:40)

But it affirmatively doesn't have a preference about anything it could do that would affect its potential lifespan, right?

Elliott Thornley (1:19:48)

well, okay. So we've just talked about the AI's lack of preference so far. It lacks a preference between every pair of different length trajectories. The agent can have preferences between same length trajectories. And this is the key thing that incomplete preferences let you do. You you can have an agent that lacks a preference between every pair of different length trajectories that has preferences between same length trajectories. And it could well be that, you you train your AI to prefer to tell you the truth in every same length trajectory. And that

Daniel (1:20:17)

You're right.

Elliott Thornley (1:20:18)

United States and the world truth.

Divia (1:20:18)

But this is dumb. I think I'm still missing something. I'm like, isn't it kind of true that literally everything the AI does will affect the probability of it will affect its expected trajectory?

Elliott Thornley (1:20:32)

Yeah, I think that's right. So in particular, so we've been talking about preference relation between trajectories, which are like getting particular outcomes for certain. In deployment, the agent is going to always be choosing between lotteries plausibly. There's going to be some uncertainty about the consequences of each action. And what the incomplete preferences sort of lead on to naturally is this principle that I call time step dominance.

Divia (1:20:34)

then how do I, sorry.

Mm

Elliott Thornley (1:21:01)

And time step dominance basically says when the agent is choosing between lotteries, the agent is not going to be willing to pay costs to shift probability mass between different trajectory lengths. And this is the key.

Divia (1:21:14)

Can you give a more concrete example of what that means to shift probability mass between trajectory lengths?

Elliott Thornley (1:21:20)

Yeah, your classic example or like very simple example is by default, the agent is with 90 % probability going to get shut down at time step one and 10 % probability going to get shut down at time step two. And, you know, suppose it cares about paper clips and no matter what, it's going to get five paper clips conditional on shut down at each time step. What it can do is

spend one paperclip to shift the probability mass such that now there's only a 10 % chance it gets shut down at time step one and 90 % chance it gets shut down at time step two. In each case, it gets like four paperclips now conditional on each trajectory length. So this would be an example of like paying costs to shift probability mass between different trajectory lengths.

Divia (1:22:10)

Okay, got it.

Daniel (1:22:12)

Can I see if I understand the case where I ask the AI how to protect the shutdown button? So maybe the thing that's going on is something like the AI just like for any given lifetime it could have until the shutdown button is pressed, the AI has a preference to tell me the truth rather than lie to me. Let's just say that it has that. Then when I ask you the question, how do I protect the shutdown button, it will like

basically tell me the truth. Like that does change the it's expected lifespan, but like, you know, there's some probability that like, you know, there's some probability that I, it tells me the answer and I like failed to protect it. And it like gets shut down exactly the same time as it would have been. and if that happens, then, it prefers to have told me the truth and I've gotten shut down early rather than to have lied to me and have gotten shut down early. and so, so if, telling me the truth, we're like,

super reliable and like definitely changing which times it would get shut down, then it might be indifferent. But because there's like some overlap in the probabilities, then it's going to want to tell me the truth. Is that, do I understand that correctly?

Elliott Thornley (1:23:24)

Yeah, that's exactly right. yeah, we're like hoping that in deployment, the agent is choosing between lotteries. And we're hoping that exactly because if the agent is choosing between different length trajectories, it's going to always lack a preference. Whereas in cases like the one you outlined, we want the agent to prefer telling the truth. And we expect the agent to tell the truth exactly because it sort of looks better conditional on each trajectory length. You know, if it gets shut down at time step one,

It's glad that it told the truth. If it gets shut down at time step two, it's glad that it told the truth and so on. Or like that's too anthropomorph -fisey. Prefer that it told the truth. Yeah.

Divia (1:24:06)

Okay, so I appreciate you talking about the proposal. I think given that we don't have a ton more time, I wanna talk a little bit about, and I can't tell if I'm mostly the one interested in this, but sort of like the sociological situation with the coherence theorems, which maybe we did a little bit before, but I think I wanna circle back to it. And maybe, I don't know, maybe Elliot, if we could start with you, like what is, I don't know, you wrote this, I don't know how much you have engaged on Less Wrong before you wrote this post. Is it?

Do you like to read your frequent reader? Do you post other things?

Elliott Thornley (1:24:36)

Again, no, not very much. think this was my first post. Maybe I'd posted like one or two comments before, I've, yeah, fair bit. And I've been spinning.

Divia (1:24:43)

And had you been reading much?

Okay, so you're a regular Less Wrong reader, but this was maybe your first post. Like, I don't know, what was your impression after posting it of like sort of the state of Less Wrong's ability to have a decent discussion of epistemic rationality on Less Wrong of like sort of epistemic health of the AI safety space, etc.

Elliott Thornley (1:25:09)

Yeah, I haven't thought too much about this, but in general, I hold less wrong and pretty high esteem. think a lot of the posts are interesting. A lot of smart people saying interesting and convincing things. lot of the comment sections and discussion seems very good. In the comment section of the Coherence Theorem's post, I think I'm partly at fault because it was like a provocative title and

I define coherence theorems in a particular way, and I think people didn't like that definition, and so a lot of discussion centered around that definition, which... yeah.

Divia (1:25:47)

It reminds me a lot, Sarah Constantine, ages ago, wrote a post on her personal blog called EA Has a Lying Problem, which I think she put, I don't know, but it's been a long time since I've revisited. I think it also has the quality of like generated a lot of interesting discussion. I think she made some substantive points. I think naturally a lot of the focus was on the title.

Elliott Thornley (1:25:57)

Mm

Mm -hmm. yeah. Yeah, I guess the title is the only thing that you can guarantee everyone is right, I think, and that's part of it.

Divia (1:26:11)

where people are like, that's not what lying means. Anyway, yeah.

Right.

Daniel (1:26:20)

Yeah, although I do think, I mean, to come to your defense a little bit, I do think that one of the types of things that less wrong, I think is supposed to be good at is just handling stipulative definitions. Like if someone's like, hey, I want to use a word in this particular way in this case. And it's like within the range of things that people sometimes mean by this word or whatever, I'm like,

I do think that's the sort of thing that you're supposed to be okay with. Or, I don't know, you might be bit annoyed if it's in the title and people might get a misimpression from the title, I actually think, I don't know, I would hope that we could handle at least this particular instance of a stipulative definition, because I think it's fine enough for its context. Yeah.

Elliott Thornley (1:27:11)

Yeah, yeah. And I should say also that I didn't expect this. Like I genuinely thought this was the less wrong definition from my like various readings of less wrong posts and things like that. And I think it's worth saying as well that like, in the decision theory literature, no such thing as a coherence theorem, like it's not a not a phrase that it comes up. So it's like a really doesn't look to be like a less wrong coining. And so I thought that

Daniel (1:27:37)

Right.

Elliott Thornley (1:27:41)

If I just read it, I'll figure out the definition and use the right definition.

Divia (1:27:41)

What an eye

Yeah, and I wasn't actually, I thought of that thing from cheerful price without thinking it would have, it have used the phrase coherence there, but then it did. I'm like, yeah, that does seem to be an example of Eliezer using it sort of kind of like how I think you were assuming people used it.

Elliott Thornley (1:28:01)

Yeah, maybe. don't know. sort of in preparation for this podcast, I went back and read the post and yeah, since maybe I'm not as confident as I once was that this was the definition. I still think like on balance, a lot of posts use it in this way, but like I'm, it's sort of more up in the air than.

Divia (1:28:22)

Yeah, this is sort of why I think I tried to add some weasel words also, because I'm like, maybe that's not what anyone really meant. But I think it's what a lot of people read. I think it's what I thought that he meant, reading it. And I think it's not just me. Because I think, Daniel, you were saying the same thing of like, were like, yeah, isn't that stuff like all proved and whatever? I think that was my point too. And I think I guess it's the majority. don't know. I'm always wrong when I say things like that. I think a significant chunk of less wrong users also had the impression when people said things like,

Daniel (1:28:39)

Yeah. Er -

Divia (1:28:50)

coherence theorem, they'll be like, yeah, it's like, prove that you like basically have to be that way or else something terrible happens, like you lose all your money.

Daniel (1:28:57)

I think maybe there's something a bit slightly more subtle here where there really are... So, I don't know, proofs generally involve, okay, if you assume, like, I'm going to assume these premises and I'm going to get this conclusion, right? And the question is, I think less wrong types are maybe more happy than they perhaps should be to accept, like, yeah, p implies q when actually p and also some additional assumption, p prime or...

whatever, which like one may or may not accept. Like, if I make some assumptions and like, feel like one is like basically solid to get Q, I might like just mentally forget the other assumption. And then there's this question of like, just how solid are your other assumptions, right? Because even with like the money pump arguments in the money pump arguments book or whatever, like those have this assumption that like, you know, you can just take a state of the world and just make it definitely better or definitely worse, right? And that's like a substantive assumption. And if

If money pump arguments were proved just up to like, have to assume the existence of money or whatever, then I'd be like, come on. Don't like, don't nitpick about that. If money pump arguments were proved up to, you have to assume one, like you're a pretty controversial claim or whatever, then that does matter. so, or it almost reminds me of like the way physicists reason. So, so like, I studied physics in undergrad to some degree, and I also studied math in undergrad to some degree and like,

physicists and mathematicians have a bit of a disagreement about just how rigorous you're supposed to be. Physicists are just very loose with, yeah, every function is twice differentiable and everything, every operation you kind of want to do that works in nice cases, we're just going to say that it works and mathematicians are not like this. And to me, feels a little bit similar except.

Also, with you drop the thing of you remember exactly which assumptions you needed.

Divia (1:30:57)

that you're saying basically the sort of less wrong style about talking about AI stuff is more like the physics level of, but you think physicists do a better job of being like, if you ask them, they'll be like, yeah, yeah, we remember what we were assuming.

Daniel (1:31:03)

It's more like the physics level, yeah.

or it'll be in a textbook somewhere that they can look up. And also like physics, a cool thing about physics as a discipline is it's like more successful than Less Wrong as a forum, as much as I love Less Wrong. Sorry.

Divia (1:31:22)

I mean, it's very empirical, right? Not very. It's a lot more empirical than like, less wrong theorizing.

Daniel (1:31:28)

Yeah, yeah, like, I think they they've got more, I don't know, somehow there's a longer intellectual tradition of like people being careful and knowing like which things you can skimp on or not. I don't know, that's the analogy that feels that comes to my mind.

Divia (1:31:43)

this year. Yeah, that there's like a real sort of history that you can trust about good heuristics for when to be rigorous and when not to be and that you don't necessarily have that same trust in like less wrong culture sense of where you need to be rigorous.

Daniel (1:31:58)

Yeah, or I have a bit less Tress. Honestly, I still like Lesserong.

Divia (1:32:02)

That makes sense to The amount you would have in physics would be quite high. So yeah, I don't know.

Daniel (1:32:06)

Yeah.

Yeah, no, no, it should be significantly less stressed, actually. I mean, I don't want to say that I totally distrust the last wrong thing. like, I don't know, often.

Like, yeah, how do I think about money pump arguments? I still think like the case where you violate one of the VNM axioms, you can't just violate one. You have to like violate a couple, I think. So if you violate completeness.

Divia (1:32:38)

Wait, sorry, Elliot, is that, I thought you said if you just violated completeness, then those are like a house of cards. You don't get any of

Elliott Thornley (1:32:45)

yeah, that's right, but I don't know if Daniel if that was what you were saying.

Divia (1:32:48)

Maybe you're saying.

Daniel (1:32:48)

yeah, yeah. What I'm saying is that you can't just get rid of one without getting rid of the others, right? It's a case where if you violate completeness, you can't still get the other three. also have to have like, you have to also violate independence. And I guess a lot of people don't mind violating independence, but.

Divia (1:32:56)

We'll see.

Elliott Thornley (1:33:14)

wait, so agents could satisfy just, transitivity, independence, and continuity.

Daniel (1:33:20)

They can?

Elliott Thornley (1:33:21)

Yeah, yeah, yeah.

Daniel (1:33:24)

I really thought they couldn't. Hang on, I thought I had a proof of this, but maybe I'm wrong.

Elliott Thornley (1:33:31)

yeah, I don't think so. So in particular, there are these representation theorems for agents that just violate completeness. And it's like these multi expected utility representations.

Daniel (1:33:46)

interesting. Okay, maybe that's another thing that I'm wrong about. Okay. Huh.

Elliott Thornley (1:33:55)

Yeah, if so one thing to think is like, if completeness implied independence or vice versa, then you'd only need like three VNM axioms rather than four.

Daniel (1:34:04)

sorry, I mean independence, sorry. I mean like if you have three of them, then you can use a money pump argument to get the fourth is what I meant. Sorry, that.

Elliott Thornley (1:34:14)

Okay, yeah, yeah, that's right. But the order is important. So from completeness, can get transitivity, but you can't like, from transitivity, independence and continuity get completeness, I don't think.

Daniel (1:34:19)

yeah.

I have seen an argument that purports to do that.

Divia (1:34:30)

even with a money box.

Elliott Thornley (1:34:32)

Okay, interesting. I'd be interested to check that out.

Daniel (1:34:35)

can send you the... Actually, it doesn't purport to do that. It just purports to show completeness. But like, if you actually think about it, it assumes independence. Which is fun. This is, so roughly the argument. So, so this is on Less Wrong. It's by John Wentworth, David Larell. And I can send you, it is called, why, maybe it's called, it's called Why Not Subagents by Wentworth and Larell.

Elliott Thornley (1:34:42)

Okay, nice. What's the argument?

Daniel (1:35:05)

And basically the rough argument is, suppose that you have no preference between A and B, but you prefer like A to C and, or it's like you have no preference, but you have some preferences the other way. like roughly the, yeah, you have no preference between A and C, but you prefer A to B and you prefer B to C. And you imagine that you're going to,

be in a world where sometimes you're going to have to choose between A and C, sometimes you're going to have to choose between A and B, sometimes you're going have to choose between B and C. If you don't have a preference between A and C, then you might accidentally choose A like... You might end up picking one over the other and that might end up getting you like B rather than A when you prefer A or the other way around. I forget exactly which one it is. But roughly the argument is if you just decide to add a...

If you randomly decide to add a preference between A and C either way, instead of being indifferent between them, then you decrease the probability that you like do multi -step trade downs. But because you're like randomly adding it, in order for the argument, like the argument says something like,

Divia (1:36:20)

Sorry, can I read a maybe relevant sentence? Okay, or two sentences actually. So it seems like part of what's going on is that...

Daniel (1:36:24)

Yeah, sure.

Divia (1:36:34)

Wentworth, I assume, was saying that, you could have a market of subagents, utility maximizing traders, it's inexploitable, but incomplete. And Nate was like, no, they would use contracts that agents with incomplete preferences will tend to pre -commit slash self -modify in ways which complete their preferences.

Daniel (1:36:53)

Yeah, that's what they're trying to demonstrate. Yeah.

Divia (1:36:57)

Yeah, okay.

Elliott Thornley (1:36:58)

Yeah, so I read this post, I have a comment underneath it where I tried to diagnose where the disagreement is. And I think it's been a while, but they assume that their agent is choosing with this like myopic veto rule.

Divia (1:37:10)

yeah, I see your comment now.

Yep, that's what you said. According to which the agent turns down a trade if the offered option is ranked lower than its current option according to one or more of the agent's utility function would lead to pursue dominated strategies in single sweetening money pumps. But the My Epic B2 rule isn't the only possible rule for decision making with incomplete preferences.

Daniel (1:37:15)

yeah.

Right.

Elliott Thornley (1:37:34)

Yeah, so I tried to push back on.

Divia (1:37:36)

Don't make a sequence of trades if there's another available sequence such that, So anyways, I continue to be very interested in this. However, I think this is really running up and get like past the time where we said we were probably gonna keep going. So I would say, let's do any final words, call this for now. And if we wanna reconvade or anything like that, or continue this argument elsewhere on the internet, that sounds awesome to me.

Daniel (1:37:47)

Right, right.

Elliott Thornley (1:38:01)

yeah, yeah, sounds good to me. Yeah.

Divia (1:38:03)

Okay. Yeah, Daniel, any closing statements from you?

Daniel (1:38:06)

closing statements. I think that maybe one thing to say is that like, it's actually like just, I don't know, if you have some time just like digging into like what these money pump arguments are and like what the arguments for expected utility maximization are. It's like, I actually found it kind of just an interesting exercise to be like, okay, what are the arguments for this and like, which things that I believe actually rely on this. And I ended up thinking that expected utility maximization

I still kind of like it as a model. I'm kind of unconvinced by proposed alternatives, but it's sort of, I don't know, just the notion of expected utility maximization. I feel like it sort of dissolves for me in a kind of interesting way. So I don't know, it's interesting to actually just think about how you might motivate these and what you can actually prove.

Divia (1:39:02)

All right, Elliot, any final thoughts?

Elliott Thornley (1:39:05)

Yeah, I feel like I don't have much more to say than what we said. I agree with Daniel that it's a very interesting area. I encourage people to have a look at the Money Pump Arguments book and have a think about this kind of thing. I think it's especially interesting to think about the arguments in the context of advanced artificial agents and what we expect from them, as opposed to the context in which Johann is writing where it's about like

rational requirements. And I feel like there's a lot of work to be done on thinking about what gets more compelling and what gets less compelling when you sort of do this mental shift in context.

Daniel (1:39:46)

I agree with that. One final thing I want to add, we're referring to this book called Money Pump Arguments by the guy, Johan Gustafsson. The listener might be like, I'm not going to read a book. Like this book is a total of 81 pages before the ends. And it's like, there's a PDF of it online. It's like, as books go, it's like pretty short and readable. So, well, it's pretty short.

Divia (1:40:02)

Nice.

Cool. All right, I appreciate the plug. I'm gonna try to link things in the show notes, but ultimately I can't promise to do that. But thanks everyone for coming. I really appreciate it. I definitely think I learned a bunch of things and I hope some of our listeners did too.

Elliott Thornley (1:40:08)

Yeah, yeah.

All right, yeah, thanks for having me.

Daniel (1:40:24)

Great chatting.

Discussion about this podcast

Mutuals
Mutual Understanding
A podcast where we seek to understand our mutual's worldviews