It would be nice to have a working formalization of TDT, but first I am just going to shoot for a working formalization of CDT in a mathematical universe. The difficulty in this problem may be described as locating yourself within the universe (to understand not just a description of the universe but also how your action controls it). To see why this might not be completely straightforward, see “AIXI and Existential Despair.”

One approach to understanding myself within the world is to have a reductionist view of the world, and the understanding that I myself am embedded somewhere in this world, so that my uncertainty about my own behavior may couple to uncertainty about the universe in a particular way. I’ll model this as an agent who knows his own description and has a mathematically defined utility function (which may depend for example on the execution of certain programs which are logically related to the agent’s own output). This is a standard formalism. Note that the agent need not have explicit representations of the programs involved–it may simply reason about mathematical structures which it suspects may have this relationship with each other.

I’ll assume that the agent has a math-intuition module, in Wei Dai’s terminology. That is, a module which at any time can assign a “probability” to each mathematical statement. We’d like to build agents which work well as long as its math intuition module works well.

The decision rule I would like to use is the following: for each possible action X, estimate the expected utility if I (counterfactually) choose X. Then choose the action with the largest expected utility. The difficulty is in defining the counterfactuals appropriately.

One way to escape this problem is to use random decision making (a similar approach is necessary to salvage decision markets). That is, with probability p make a decision randomly, and then ask about the probable outcomes conditioned on yourself choosing X at random. This has the virtue that it makes counterfactual definitions nearly trivial: if our coins are really random, then we can talk directly about the possible worlds in which they came up differently without having to perform any (as yet somewhat mysterious) “counterfactual surgery.”

Unfortunately it is surprisingly hard to write down a formal agent who cares about things like “that coin I just flipped” (I believe that the approach I outlined to specifying counterfactuals may be made to work, but it is at least tricky). Instead we can consider abstract psuedorandomness, for example by asking cryptographic questions whose answers we know we could compute but whose answers we are currently uncertain about. This can be formalized by using the math intuition module, and it seems like it may work well if our intuition module has sufficient self-confidence regarding its beliefs about the sort of cryptographic challenges we are trying to use as coins (that is, using such reflective beliefs the math intuition module can treat a statement to which it assigns a 50% probability very nearly as if it were a “logical coin” which the universe hasn’t yet flipped–this sort of reasoning isn’t valid in general, see “Beliefs which are Always Wrong”.)

Here is an informal sketch of a decision procedure along these lines:

1) Using the math intuition module, find a statement X to which the math intuition module assigns a probability of about 2^-20 (say) whose truth can be determined in a reasonable amount of time, and an assertion Y to which the math intuition module assigns a probability of Â roughly 1/2 . This can be accomplished by taking a bunch of cryptographic problems, and taking a conjunction of many uncertain assertions about their solutions. (There are classes of cryptographic problems such that the agent can learn the answer to any given question in a reasonable amount of time, but under mild assumptions it is guaranteed to have significant uncertainty about almost every particular problem until it has spent a modest amount of time thinking about it.)

2) Ask the math intuition module about your expected utility conditional on X AND Y, and about your expected utility of X AND NOT Y.

3) Evaluate X and Y.

4) If X AND NOT Y, output decision 1. If X AND Y, output decision 1. If NOT X, then output decision 0 or 1 according to whether the math intuition module’s estimate of your utility is higher when conditioning on X AND NOT Y or on X AND Y.

If X and Y don’t couple to your utility function except through your decision, then this approach seems to produce essentially correct (consequentialist, not timeless) decisions.

In Newcomb-like situations, this algorithm will one-box if the simulation is exact–because it is using logical “coins”, it will play randomly if and only if its simulation plays randomly–but it will not necessarily one-box against approximate simulations. Moreover, this effect manifests even when the predictor’s error is very low; for example, if the simulator isn’t able to guess what X the agent will choose, then the simulator can simply predict two-box and be correct with probability 1 – P(X).

Note that the point of using a math intuition module and reasoning about apparent “acausal” control was not to succeed at a limited class of Newcomb-like problems: the point was to make sensible decisions at all. Similarly, the purpose of randomness was to allow any consideration of counterfactuals at all.

Note that if X and Y couple with the agent’s utility through some mechanism other than the agent’s action, the agent may fail to behave reasonably. The main way that such a coupling could occur is if a mugger trying to control the agent’s behavior made decisions on the basis of X or Y, after learning that they will be used for the agent’s decision. But note that, once we have already introduced a mugger, normal CDT will fail in the same way.