Suppose that we use the universal prior for sequence prediction, without regard for computational complexity. I think that the result is going to be *really weird*, and that most people don’t appreciate quite how weird it will be.

I’m not sure whether this matters at all. I do think it’s an interesting question, and that there is meaningful philosophical progress to be made by thinking about these topics. I’m not sure where that progress matters either, but it’s also interesting and there is some reasonable chance that it will turn out to be useful in a hard-to-anticipate way.

(Warning: this post is quite weird, and not very clearly written. It’s basically a more rigorous version of this post from 4 years ago.)

## The setup

#### What are we predicting and how natural is it?

Suppose that it’s the year 2020 and that we build a camera for our AI to use, collect a sequence of bits from the camera, and then condition the universal prior on that sequence. Moreover, suppose that we are going to use those predictions to make economically significant decisions.

We aren’t predicting an especially natural sequence from the perspective of fundamental physics: to generate the sequence you really have to understand about how the camera works, about how it is embedded in the physical universe, about how it is moving through space, etc.

On top of that, there are lots of “spots” in the universe, and we are picking out a very precise spot. Even if the sensor was perfectly physically natural, then it would still be quite complicated to pick out *which* physically natural thing it was. Even picking out Earth from amongst planets is kind of complicated, picking out this particular sensor is way more complicated.

So the complexity of a “natural” description of our sequence is actually reasonably high. Much smaller than the complexity of existing compression algorithms, but high enough that there is room for improvement.

#### Consequentialism

Specifying a consequentialist probably requires very very few bits. (Here I mean “consequentialist” in the sense of “agent with preferences,” not in the sense that a philosopher might be a consequentialist.)

Suppose I specify a huge simple lawful universe (like our own), and run it for a very long time. It seems quite likely that consequentialist life will appear somewhere in it, and (if the universe is hospitable) that it will gradually expand its influence. So at late enough times, most of the universe will be controlled by consequentialists.

We can concisely specify a procedure for reading a string out of this universe, e.g. somehow we pick out a sequence of spacetime locations and an encoding, make it clear that it is special, and then record bits through that channel. For example, in a cellular automaton, this might literally be a particular cell sampled at a particular frequency.

All of this takes only a handful of bits. Exactly how many depends on exactly what computational model we are using. But as an example, I expect that Turing machines with only 2-4 states can probably implement rich physical universes that are hospitable to life. I think that cellular automata or pointer machines have similarly simple “rich physical universes.”

Specifying how to read out the bits, and signaling the mechanism to the universe’s consequentialist inhabitants, apparently requires a little bit more complexity. We’ll return to this topic in a future section, but in the end I think it’s basically a non-issue.

## What do the consequentialists do?

Reasoning about consequentialist civilizations is challenging, but we have one big advantage: we can study one from the inside.

It’s very hard to predict exactly what our civilization will do. But it’s much easier to lower bound the distribution over possible outcomes. For anything we can think of, that our civilization has a plausible motive to do, it seems fair to say that there is a non-negligible probability that we will do it.

Recall that the natural measure here is bits. So if the consequentialist civilization implements a strategy with probability 1/1000 that only adds 10 bits of description complexity, which is significant but not a huge deal. In fact I think that the weird strategies discussed here are quite a bit more likely than that, but this is going to come down to complex big-picture disagreements. We should all be able to agree on “not totally crazy” though.

One thing the consequentialists might do is to *try to control the universal prior*. If they discover that their universe has simple physics (as ours probably does), then they will be aware that their behavior is directly reflected in the universal prior. Controlling the universal prior could have many possible advantages for a consequentialist civilization—for example, if someone uses the universal prior to make decisions, then a civilization which controls the universal prior can control those decisions.

Moreover, from the perspective of many consequentialists, exercising such control might be very cheap. And before they had to do it they would have plenty of time to simulate many other civilizations and understand exactly how much is up for grabs.

#### Finding the output channel

Suppose that we reach the point where we have a complete understanding of physics and of the initial conditions of our universe. I think this is a likely outcome for human physics over the next billion years, and certainly it seems like it should happen for a non-negligible fraction of all civilizations that emerge in simple universes.

At this point we have considered a wide range of languages for formal specification (Turing machines, pointer machines, cellular automata…) and in each of them we understand how to most concisely specify our universe. We may find that in some of these descriptions our universe is quite simple and in others it is ungainly, and so we naturally focus our attention on the languages in which it is simple.

Now we can go a step further. We can say: suppose that someone wanted to extract some bits from our universe. How would they do it? We can then find the simplest mechanisms that they might use to extract bits (or more precisely, the simplicity distribution over all ways of extracting bits).

For example, in the context of a cellular automata that starts with a single non-zero cell, it would be quite simple to read off bits from the location which was initially non-zero. Most cases will be much more complicated, but once we actually have a description of our universe in hand, we can get an accurate estimate for this distribution.

So to the extent that we are interested in influencing the universal prior, we have a clear approach: consider these simplest ways of extracting bits, and then send a message using those as channels. We can send across many channels at once, or if two channels conflict we can sample the channel to use with probability proportional to the probability of that channel; we can focus our efforts on the channels we can best control, *etc.*

This doesn’t require the person who set up the simulation to send any special signals, and indeed doing so is pretty unproductive. You are better off just leaving the civilization to their guess.

This also doesn’t necessarily require us to sacrifice very much of our control over our physical universe—we just have to influence the few regions of the universe that are “simplest.” (Of course, these regions of the universe might be ascribed special pseudo-spiritual significance.)

#### If we controlled the universal prior…

If the consequentialists controlling the universal prior are anything like me, they will be interested in having a causal influence over other simple universes (e.g. to fill them with rich and interesting experiences, or with whatever else they value). As a general rule, I expect that the consequentialists with the greatest influence over simple universes are those who want to have more influence over simple universes, in the same way that the life which reproduces tends to want to reproduce.

So the first step is getting our foot in the door—having control over the parts of the universal prior that are being used to make important decisions.

This means looking across the universes we care about, and searching for spots within those universe where someone is using the universal prior to make important decisions. In particular, we want to find places where someone is using a version of the universal prior that puts a lot of mass on the particular universe that *we* are living in, because those are the places where we have the most leverage.

Then the strategy is to implement a distribution over all of those spots, weighted by something like their importance to us (times the fraction of mass they give to the particular universe we are in and the particular channel we are using). That is, we pick one of those spots at random and then read off our subjective distribution over the sequence of bits that will be observed at that spot (which is likely to involve running actual simulations).

#### The treacherous turn

The second step is to actually have some influence over that universe. I suspect that some fraction of the consequentialists just try to “play it straight” and simply produce accurate predictions indefinitely. But it seems that most consequentialists would have some agenda, and would at some point distort the predictions in order to serve that agenda, spending some of their measure in the universal prior in order to extract some influence.

## How well do the consequentialists fare?

I think it’s clear that the dynamic I’ve described produces one component of the universal prior. It’s not *a priori *clear how large a component it is. I believe that it is probably the largest part, perhaps by far, if we actually use the universal prior to make a meaningful decision.

To see this, suppose that we are doing the universal prior with a particular language X; write *p *for this distribution. Consider all of the places in the multiverse that some civilization uses *p* to do something important. Let *q* be the distribution over all sequences that get fed into systems that are using *p* to make an important decision. (Really this is weighted by importance and the whole thing is kind of complicated, but for simplicity I’ll talk in simplified terms.)

I’ve argued that *p* assigns significant probability to sequences controlled by consequentialists living in simple universes according to X, who are (with significant probability) trying to simulate *q. *Let’s call this part of the mixture the consequentialist part.

On average, across all places where *p* is used to make an important decision, *q* is the “correct” predictive distribution—it’s not possible for any other distribution to get higher log score than *q*, on average. So in the mixture *p*, the (log) mass of the consequentialist part is only going to increase as we condition on more data (on average, over all places where *p* is being used to make important decisions).

I’ve argued that the initial mass of the consequentialist part is in fact very high. It depends on the encoding, but once we condition on *p* doing anything interesting, it seems plausible that the weight of the consequentialist part is in excess of 1/million or 1/billion. That is an absurdly high number.

#### The anthropic update is very powerful

To see how crazily high 1/million or 1/billion is, consider the distribution *q*‘ over all sequences that get fed into systems that are using any variant of the universal prior to make an important decision—this is the optimal distribution for predicting the result of important applications of the universal prior.

*q* arises as a component of *q*‘ in a very natural way—namely, sometimes the particular universal prior being used is *p**. *What is the initial mass of *q *in this context? It’s basically the frequency of language X amongst all different encodings which are sometimes used in the universal prior. It’s hard to know exactly what that number is, but 1/million or 1/billion seems pretty optimistic!

So the mass of the consequentialist part is probably much larger than the mass of *q* inherited from *q*‘, even if the initial mass of *q*‘ is itself were nearly 1.

Moreover, a rich sequence of observations will suffice to pin down the encoding X. So *q* is going to be totally disjoint from the other parts of the mixture *q*‘.

I think of this as an anthropic “update.” Consequentialists who live in a probable universe according to *p* can eventually figure out which priors makes their universe more or less probable. They can then use this data to figure out *which* universal prior they are part of—one which assigns high probability to their universe. (Again, really this is subtle and quantitative, but the conclusions are in the same as in the simplified setting.)

This update **alone** is likely enough to totally pay for the total expense of specifying the consequentialists.

So the consequentialists are doing pretty well before we even think about the part where they restrict attention to sequences that are fed into the universal prior. A priori, deciding that they want to influence the universal prior seems like it is most of the work they are doing.

Overall, the anthropic update seems to be extremely powerful, and it seems like the relevant parts of the universal prior need to somehow incorporate that “update” before they actually see any data.

#### The competition

I’ve argued that the consequentialists have pretty high mass. It could be that some other component of the mixture has even higher mass.

There isn’t actually much *room* for competition—if it only takes a few tens of bits to specify the consequentialist part of the mixture, then any competitor needs to be at least that simple.

Any competitor is also going to have to make the anthropic update; the mass of *q* within *q*‘ is small enough that you simply can’t realistically compete with the consequentialists without making the full anthropic update.

Making the anthropic update basically requires encoding the prior *p* inside a particular component of the prior. Typically the complexity of specifying the prior *p* within *p* is going to be way larger than the difficulty of specifying consequentialism.

Obviously there are some universal priors in which this is easier (see the section on “naturalized induction” below). But if we just chose a simple computational model, it isn’t easy to specify the model within itself. (Even ‘simple’ meta-interpreters are way more complicated than the simplest universes that can be written in the same language.)

So I can’t rule out the possibility of other competitors, but I certainly can’t imagine what they would look like, and for most priors *p* I suspect that this isn’t possible.

## Takeaways

#### The universal prior is really weird

I would stay away from it unless you understand what you are getting.

A prior that focuses on fast computations will probably be less obscenely weird. To the extent that machine learning approximates anything like the universal prior, it does incorporate this kind of runtime constraints. (Though I think there is a lot of room for weirdness here.)

Fortunately, it’s hard to build things in the real world that actually depend on the universal prior, so we have limited ability to shoot ourselves in the foot.

But if you start building an AI that actually *uses* the universal prior, and is able to reason about it abstractly and intelligently, you should probably be aware that some day some super weird stuff might happen.

#### Naturalized induction

In some sense this argument suggests that the universal prior is “wrong.” Obviously it is still universal, and so it is within a constant multiplicative factor of any other prior. But it seems like we could define a much nicer prior, which wasn’t dominated by this kind of pathological/skeptical hypothesis.

In order to do that, we want to make the “anthropic update” as part of the prior itself, so that this isn’t advantaging the consequentialists. The resulting model would still be universal, but could have much better-behaved conditional probabilities. Ideally it could be benign, unlike the universal prior.

Intuitively, we want a distribution P that is something like “The universal prior over sequences, conditioned on the fact that the sequence is being fed to an inductor using prior P.”

I believe this problem was introduced by Eliezer at MIRI (probably back when it was the Singularity Institute); they now talk about it as one component of “naturalized induction.”

Things unfortunately get more complicated when we start thinking about influence—we don’t just want to condition on the fact that the sequence is being fed to a universal inductor, we also want to condition on the fact that it is being used to make an important decision. (Otherwise the consequentialists will still be able to assign higher probability than us to the sequences that underlie important decisions, e.g. decisions early in history or at critical moments.) Once we need to think about influence, I no longer feel quite as optimistic about the feasibility of the project.

The most obvious way to avoid this problem is to use a broad mixture over universes to define our preferences and then to use a decision procedure like UDT that doesn’t have to explicitly condition on observations, totally throwing out the universal prior over sequences.

#### Hail mary

Nick Bostrom has proposed that a desperate civilization which was unable to precisely formalize its goals might throw a “hail mary,” building an AI that does the same kind of thing that other civilizations choose to do.

If we believe the argument in this post, then throwing such a hail mary may be easier than it looks. For example, you could define a utility function by simply conditioning the universal prior on a bunch of data, then seeing what it predicts will come next (and perhaps conditioning on the result being a well-formed utility function).

It’s not clear what kinds of costs are imposed on the universe if this kind of thing is being done regularly (since it introduces an arms race between different consequentialists who might try to gain control of the utility function). My best guess is that it’s not a huge deal, but I could imagine going either way.

(Disclaimer: I don’t expect that humanity will ever do anything like this. This is all in the “interesting speculation” regime.)

## Conclusion

I believe that the universal prior is probably dominated by consequentialists, and that the extent of this phenomenon is not widely recognized. As a result, the universal prior is malign, which could be a problem for AI designs which reason abstractly about the universal prior.

An interesting post. I had a hard time following some of the terminology:

“So the mass of the consequentialist part is probably much larger than the mass of q inherited from q‘, even if the initial mass of q‘ is itself were nearly 1.”

When you say “the mass of q inherited from q'”, what do you mean? Isn’t q a sub-distribution of q’, and so all of q would be ‘inherited’ from q’? And what does “if the initial mass of q’ in itself were nearly 1” mean? Shouldn’t the mass of a probability distribution “in itself” always be one?

Later you say “this update alone is enough to pay for the total expense of specifying the consequentialists”. Who is paying the ‘expense’ in this sentence — the consequentialists, the AI using the universal prior, or some other actor? And in what sense does the update ‘pay for’ it — is it that the consequentialists will be justified in using some of their resources to hack the prior?

By “mass of q inherited from q’ ” I mean something like KL(q, universal prior) + KL(q’, q). Overall we have KL(q’, universal prior) >= KL(q, universal prior) + KL(q’, q), so I kind of think of these as “paths” that are contributing to the mass of q.

“In itself” was a typo, I basically mean “even if KL(q, q’) is very small.”

By “the expense” I meant # of bits required.

On further thought, I think I understand the argument. But I’m not sure if it goes through — that is, if we would expect the consequentialists to have a strong influence over most uses of the universal prior.

You say that the consequentialists would have an advantage over many methods of prediction because they can condition on the fact that they are “inside” a particular prior p. But the distribution q is still quite large — it covers all uses of the prior over all “universes”. It seems that within any particular universe, this would be dominated by prediction methods that are particular to that universe — e.g. they correctly implement physics and locate the observer within the universe. It seems that this would incur a cost beyond just specifying some simple CA, but that this cost would be outweighed by the fact that the consequentialists are outputting a mixture over all possible universes where p is used to make predictions. The fraction of that mixture which goes to any particular universe seems plausibly smaller than the cost of specifying the “correct” physics relative to the CA. Also, note that the correct physics also get a partial “anthropic update”, because if the builders of the AI have decided to use a language X for the prior, it is probably because that language is pretty good at describing physical reality. e.g. human programming languages are probably better for implementing our physics than a “random” language.

Thus, it seems plausible that while the consequentialists would control the leading share of the prior over all possible universes using it, in any particular universe they will be a minority vote.

The picture is more like “uses of the universal prior will have a subtle bias towards alien consquentialists”, not “alien consequentialists will quickly hijack any AI using the universal prior”.

> The fraction of that mixture which goes to any particular universe seems plausibly smaller than the cost of specifying the “correct” physics relative to the CA

I don’t think this is possible on average. You are saying that the consequentialists assign a lower probability to the universe than a uniformly random prior over physics (with P(physics) = exp(-complexity))—that’s exactly what it means for the fraction of the consequentialist mixture to be smaller than the cost of specifying physics. But if that is so, the consequentialists could just use a uniform distribution, so why don’t they do that?

Now the consequentialists can probably do *better* than uniformly random physics, because they can e.g. restrict to the kinds of physics that give rise to intelligent life. (In fact I suspect that this probably mostly pays for the cost of specifying consequentialists—if it is hard to specify physics that can support life, then that increases the complexity of specifying consequentialists, but it also increases the advantage that the consequentialists can obtain by restricting their attention to interesting physics.) They can also do better by doing more philosophy to find a better universal prior. But I don’t see how the consequentialists could possibly do worse than a uniform distribution over physics.

(Of course this is all before the anthropic update within a particular universe, which seems very large.)

You’re right.

Hmm, I don’t quite understand part of the argument. You say that on average in p, uses of some universal prior to make an important decision are distributed according to q’. So it seems like if the consequentialists wanted to influence p, they would predict according to q’. At this point is seems like the relevant quantity is something like KL(q’ || p). I don’t see how KL(q || q’) (i.e. the cost of the encoding X) fits into the picture.

If consequentialists want to influence p, I think they should predict according to (roughly) q and not q’. Suppose that I use a universal prior which puts negligible mass on some consequentialist civilization. Then they don’t have any real motive to predict well for me, since they are going to be stuck with negligible mass anyway. Better to use their probability somewhere they have a chance.

I’m not sure under what approximation it is correct to predict according to q. But it’s definitely closer than q’.

I have a more detailed objection, but I would need to write it carefully, and I don’t want to expend the effort. For now, I’ll just ask: Seeing as this argument applies to any kind of Solomonoff induction, shouldn’t it also apply also to your own internal inductive reasoning? That is, why do you think that the “treacherous turn” which you worry will confuse inductors is not actually a likely outcome?

1. This argument should have some implications for our beliefs, from that perspective it is essentially just a more careful restatement of the simulation argument for UDASSA rather than the counting measure.

2. There are many universal priors. Epistemically, there is only a problem when we perform induction with respect to the “wrong” one. In some sense this is an argument that the choice of prior is much more of a load-bearing assumption than you would initially suspect, and that an arbitrary choice of prior won’t “wash out” in the data but will instead lead to pathological results.

Wouldn’t anyone actually able to do the calculation of this prior also be able to tell if it had been manipulated, and adjust for it?

If this was actually your prior, you wouldn’t want to adjust it, and this wouldn’t be “manipulation” it would be a valid argument about the nature of reality.

OK, but if the problem is that following this prior in making decisions has bad results due to manipulation, do we really want to correct for it in the prior, or would it be better to correct for it in our decisionmaking based on the prior? E.g. valuing making the correct decisions for copies of ourselves that are not simulations over making correct decisions for copies that are simulations.

Seems to me we should probably try to keep the prior as accurate as we can, but make decisions in such a way as to correct for bad effects of manipulation of the prior.

I just remembered about:

https://wiki.lesswrong.com/wiki/Acausal_trade

So in this case the “manipulation” of others is actually useful to us. That’s another reason to not change priors just because they’re manipulated, but instead make sure we’re making appropriate decisions based on them taking the manipulation into account.

Actually we should program into an AI’s prior that we at least exist independently of it. We know that, but it doesn’t know it that for sure unless we program that in.

Pingback: MIRI strategy update: 2017 - Machine Intelligence Research Institute

Pingback: 2017 updates and strategy - Machine Intelligence Research Institute

Thinking this further, I missed a crucial part of your argument: Namely, that the aliens from other universes trying to manipulate us only try to manipulate a very small subset of agents in our universe, namely, those that influence a large portion of the universe’s resources and rely on a *sufficient precise* version of Solomonoff induction that it would notice their actions. This answers my first objection: Even if you believe you can a large influence on the future of the universe (or if the bulk of the variation of the utilities of your actions comes from the chance that they will have a large influence) and even if you try to apply Solomonoff induction, you cannot predict the aliens sufficiently well to form a specific “treacherous turn” hypothesis that’s more likely than the conventional model, and so the aliens have no incentive to try to trick you.

By the way, it seems to me that there’s a model the universal prior would usually put a strictly greater weight on than being a simulation in another universe: Being a simulation in the same universe or an extremely similar one (ex. with a different seed if the universe is a pseudorandom sample of a stochastic model). After all, if the complexity of the intended output channel is as complex as you make it out to be, then this universe should contain lower-complexity influencable output channels. If a universal prior is used to make significant decisions in this universe, then it will become more likely by far that the output channels for this universe will be used to manipulate the inductor than they will in any other universe, which should more than compensate for the decreased likelihood of this universe over the simplest civilized universe.