Epistemic Chicken

Consider a fixed goal-seeking agent , who is told its own code and that its objective function is U = { T if A(<A>,<U>) halts after T steps, 0 otherwise }. Alternatively, consider a pair of agents A, B, running similar AIs, who are told their own code as well as their own utility function U = { -1 if you don’t halt, 0 if you halt but your opponent halts after at least as many steps, +1 otherwise }. What would you do as A, in either situation? (That is, what happens if A is an appropriate wrapper around an emulation of your brain, giving it access to arbitrarily powerful computational aids?)

Abstract Randomness and Formal CDT

It would be nice to have a working formalization of TDT, but first I am just going to shoot for a working formalization of CDT in a mathematical universe. The difficulty in this problem may be described as locating yourself within the universe (to understand not just a description of the universe but also how your action controls it). To see why this might not be completely straightforward, see “AIXI and Existential Despair.”

Continue reading

Risk Arbitrage

People have different risk profiles, and different beliefs about the future. But it seems to me like these differences should probably get washed out in markets, so that as a society we pursue investments if and only if they have good returns using some particular beliefs (call them the market’s beliefs) and with respect to some particular risk profile (call it the market’s risk profile).

As it turns out, if we idealize the world hard enough these two notions collapse, yielding a single probability distribution P which has the following property: on the margins, every individual should make an investment if and only if it has a positive expected value with respect to P. This probability distribution tends to be somewhat pessimistic: because people care about wealth more in worlds where wealth is scarce (being risk averse), events like a complete market collapse receive higher probability under P than under the “real” probability distribution over possible futures.

Continue reading

Specifying (non-decision-theoretic) Counterfactuals

Here is a simple trick for specifying a computer in the physical world’s future inputs: run the computer for a long time, and then ask for the simplest description of the resulting sequence of inputs. The resulting description is a good predictor for future inputs, provided we live in a suitable universe.

(This is vulnerable to all of the same attacks defined in “Hazards,” and if we really want to get access to the universe as a whole, rather than just to a simulation of a single brain, it will be much harder to get around these problems.)

Now suppose we have a single bit X on a computer, and we would like to talk about the counterfactual world in which X’s value was flipped. How can we do this? Or perhaps we would like to consider an entire ensemble of possible counterfactuals in which we were given one of exponentially many possible messages m1, m2, ….

Continue reading

Hazards for Formal Specifications

I have described a candidate scheme for mathematically pinpointing the human decision process, by conditioning the univeral prior on agreement with the human’s observed behavior. I would like to point out three dangers with this approach, which seem to apply quite generally to attempts to mathematically specify value (and have analogs for other aspects of agents’ behavior):

Continue reading

Short Explanations of Observations in Physical Worlds

[This post contributes nothing new.]

Consider the sequence of bits observed by a camera situated within the physical universe (which we can imagine as a CA for concreteness).  If we draw a program uniformly at random (i.e., fixing a universal prefix free encoding) and condition on agreement with this prefix, what does the posterior (over programs) look like?

Continue reading

Cellular Automata

In the interest of concreteness, I am going to talk about cellular automata (CA) a lot here. They serve as a convenient toy example for talking about computation, and particularly about structures embedded in computations (it is easy to think about how such structures exert control over their environment, although this is just as philosophically problematic as acausal control in general). CA have no relevant mystical properties. You could substitute any other sufficiently complicated program, but CA have the virtue of matching our intuition about physics in several ways (similar notions of space and time, of regular physical law, and so on). Whenever the intuition from CAs seems to get in the way of thinking about what is going on in generality I will abandon them.

Continue reading