It would be nice to have a working formalization of TDT, but first I am just going to shoot for a working formalization of CDT in a mathematical universe. The difficulty in this problem may be described as locating yourself within the universe (to understand not just a description of the universe but also how your action controls it). To see why this might not be completely straightforward, see “AIXI and Existential Despair.”
Risk Arbitrage
People have different risk profiles, and different beliefs about the future. But it seems to me like these differences should probably get washed out in markets, so that as a society we pursue investments if and only if they have good returns using some particular beliefs (call them the market’s beliefs) and with respect to some particular risk profile (call it the market’s risk profile).
As it turns out, if we idealize the world hard enough these two notions collapse, yielding a single probability distribution P which has the following property: on the margins, every individual should make an investment if and only if it has a positive expected value with respect to P. This probability distribution tends to be somewhat pessimistic: because people care about wealth more in worlds where wealth is scarce (being risk averse), events like a complete market collapse receive higher probability under P than under the “real” probability distribution over possible futures.
Specifying (non-decision-theoretic) Counterfactuals
Here is a simple trick for specifying a computer in the physical world’s future inputs: run the computer for a long time, and then ask for the simplest description of the resulting sequence of inputs. The resulting description is a good predictor for future inputs, provided we live in a suitable universe.
(This is vulnerable to all of the same attacks defined in “Hazards,” and if we really want to get access to the universe as a whole, rather than just to a simulation of a single brain, it will be much harder to get around these problems.)
Now suppose we have a single bit X on a computer, and we would like to talk about the counterfactual world in which X’s value was flipped. How can we do this? Or perhaps we would like to consider an entire ensemble of possible counterfactuals in which we were given one of exponentially many possible messages m1, m2, ….
Hazards for Formal Specifications
I have described a candidate scheme for mathematically pinpointing the human decision process, by conditioning the univeral prior on agreement with the human’s observed behavior. I would like to point out three dangers with this approach, which seem to apply quite generally to attempts to mathematically specify value (and have analogs for other aspects of agents’ behavior):
Short Explanations of Observations in Physical Worlds
[This post contributes nothing new.]
Consider the sequence of bits observed by a camera situated within the physical universe (which we can imagine as a CA for concreteness). If we draw a program uniformly at random (i.e., fixing a universal prefix free encoding) and condition on agreement with this prefix, what does the posterior (over programs) look like?
Cellular Automata
In the interest of concreteness, I am going to talk about cellular automata (CA) a lot here. They serve as a convenient toy example for talking about computation, and particularly about structures embedded in computations (it is easy to think about how such structures exert control over their environment, although this is just as philosophically problematic as acausal control in general). CA have no relevant mystical properties. You could substitute any other sufficiently complicated program, but CA have the virtue of matching our intuition about physics in several ways (similar notions of space and time, of regular physical law, and so on). Whenever the intuition from CAs seems to get in the way of thinking about what is going on in generality I will abandon them.
Specifying Humans Formally (Using an Oracle for Physics)
Although I don’t yet have any idea how to build an AI which pursues a goal I give it, I am optimistic that one day humans might. Writing down any understandable goal at all, much less one which humans approve of, looks like it might be quite hard even with a seed AI in hand, however. The issue is complicated by complete ignorance of the hypothetical AI’s ontology or the mechanisms by which the creators might have access to that ontology.
I do have some fixed points: I believe that any AI will probably at a minimum be able to reason about logical and mathematical truth, and I believe that many natural goals will want to use the subexpression “a human’s decision process” (for example appealing to a human’s decision process to make some judgment about input sequences).
This leads to a natural goal: if all I can talk to the AI about is math, how do I tell it “here is a human”?
Here is an attempt, though I will later give some reasons it may break down (for now the failures are left as an exercise for the counterfactual inquisitive reader).