“Expected utility maximization” is a generally accepted decision procedure, at least up to computational limitations. But it is worth remembering that as of today, no one seems to have produced a formalization of EU maximization that leads to sane behavior. I’ll briefly review some of the properties we might like our decision theories to have, and observe that for the most part we can’t write code that satisfies any of them in much generality, even if we could deal with the classical AI problems of inference and optimization.
Existing formal decision theories may be described as “dualist”: they consider the rest of the world on one side of a divider and the agent on the other, interacting through some formally specified interface. The agent’s algorithm is absent from the world model, affecting reality only by its outputs. In reality, the agent is part of the world, governed by the same laws and interacting with the rest of the world through a complex and jagged interface. It has been remarked that AIXI cannot recognize itself in a mirror or refrain from using the computer it is running on for scrap metal.
Game theory provides a simple example of one concrete failure. Consider an iterated game of rock-paper-scissors in which the winner takes $1 from the loser, but either player may opt-out for a small penalty. After losing enough money, a human will start opting out, realizing that her opponent is exploiting some regularity in her thinking which she can’t eliminate. A traditional Bayesian consequentialist dualist is not capable of reasoning in this way: independent of its decision, it believes that the opponent is going to play some fixed distribution over their possible plays. Regardless of the agent’s logical uncertainty there is guaranteed to be some option which it believes has a non-negative expected value and which it therefore prefers to opting out.
Even if we had a compact physical Theory of Everything, an agent is still left with an important form of uncertainty: where is it located within that universe? For example, if I were to tell you right now that you inhabit a game of life board with some compactly specified initial conditions and your goal is to make as many of the squares alive as possible, what formalism could represent your uncertainty about the relationship between your sense experience and your knowledge of underlying reality or your goals? What about information from introspection, or from mere knowledge of your own existence? Traditional priors do not seem suited to this task, although our understanding of the situation is not yet good enough to say anything with confidence.
Although this problem apparently must be solved by any successful formal decision theory, it may be most productive to think about it independently. I am inclined to list it here anyway in part because Ambient Decision Theory seems to be the closest humans have come to getting a handle on the problem.
Although we consider an agent’s output as its primary method for interacting with the world, its computation may interact with the world through other channels.
For example, if an agent is playing rock-paper-scissors against an adversary who has a complete model of the agent but limited computation, the agent should choose its move in a way that the adversary will be unable to predict. If an agent is attempting to acausally cooperate with a computationally weaker opponent, the agent needs to ensure that the weaker agent is still incentivized to cooperate (for example, the more powerful agent may want to make its decision in a simple way that can be understood by the weaker agent). If an agent learns that certain types of operations consume more power or dissipate more heat, it will want to change its use of those operations strategically.
Wei Dai has proposed dealing with these issues by first running a less computationally expensive algorithm (using a computation-oblivious decision theory) to determine what more expensive algorithm should be used to make a particular object level decision. Overall, this problem is the only one on the list which appears to be satisfactorily resolved, and further progress seems likely to clarify the difficulties of recursively self-modifying AGI but not to help with decision theory or FAI in particular.
Existing formal decision theories seem to exhibit reflective inconsistency: an agent using one of these decision procedures would choose to replace itself by a suitable agent running a different decision procedure. To the extent that an AI is motivated and able to fundamentally modify its own decision procedure and goals (or create new agents with different decision procedures or goals), we have at best a very limited understanding of its behavior. Aside from philosophical interest, this provides a strong motivation for seeking decision theories which will not be immediately replaced by the agents using them.
This principle is well-illustrated by a slightly modified version of Parfit’s hitchhiker. Suppose that, unbeknownst to you, your life is currently in danger. A potential benefactor has the opportunity to save your life at great expense, but fortunately expects to acquire evidence which will convince you that he did in fact save your life. He plans to save your life, approach you with this evidence, and ask you to pay him $100. For better or for worse, this benefactor can reliably predict whether you will pay, and only intends to help you if he expects to receive $100. Many decision procedures fail to hand over the $100 after having been saved, because cooperation no longer leads causally to any benefits. But an agent using most of these decision procedures would rather be the sort of person who pays the $100 and gets saved, and would therefore immediately abandon its original decision procedure and adopt a new one.
Consider two agents aware of each other’s decision theories (or of each other’s complete descriptions) who are faced with the following cooperation problem: each independently may pay $1 in order to give the other $2. Without the ability to enter binding contracts, agents using traditional decision procedures will generally decline to pay for the others’ advantage, despite the fact that both agents can receive an extra dollar if they both cooperate. However, when agents can reliably model each other they may be able to reach a Pareto efficient outcome.
We would like to find a formal and general decision theory which at least occasionally leads two agents using that theory to cooperate (without requiring the agents to be identical) but no such theory is known. The difficulty of this problem is suggested by the difficulty of cooperative game theory without transferrable utility. However, existing work on updateless decision theory seems to come tantalizingly (to me at least) close to coordinating acausal cooperation in some situations and at present no theoretical obstacles are known.
Acausal cooperation and reflective consistency appear to be closely linked: for example, we can view Parfit’s hitchhiker as a cooperation problem between an agent and the benefactor’s model of that agent. One distinction between these settings is that many open problems related to reflective consistency can be stated in a particularly “fair” model, in which payoff depends only on the input-output behavior of an agent, while acausal coordination seems to require that the agents are able to see each other’s source, and different intuitions suggest that acausal cooperation and reflective consistency are possible.