Indirect normative theories are a general approach to formally specifying what we value, without having to actually solve any hard problems. In this post I give a very formal proposal which could be implemented with (roughly) existing technology. It’s not something I would ever recommend doing, but I think it does work as a general argument of feasibility for this class of approaches.
The usual universal prior is really weird, and it would probably be bad if we ever actually used it to make important decisions.
Some interesting, optimistic ideas about AI safety.
What happens if you can’t understand your values at all? We don’t have a clear enough account of logical uncertainty to give a great answer, but I think this confusing post makes the right first observation.
A discussion of counterfactual oversight and reliability.
A formal specification for (a narrow class of) counterfactuals, and a similar specification for implementing causal decision theory.
Why think about these things at all?
There are some hypotheses we assign a very low probability. But if you assign a simple hypothesis a very low probability, you will never believe it.
The speed prior doesn’t have anything to say about MWI except by straightforward question-begging. By similar arguments, it’s not clear it has anything to say about any physical questions.
The amount of meaningful computation we can do in the universe may be much larger than you would think. I was more pleased with this post before I learned that it was an old result in complexity theory, but it’s still an interesting point.
Formally speaking, if it were possible a physical implementation of AIXI would have problems. It looks like natural realistic approximations would have problems, too.
Prediction markets don’t tell you the probability of things. Probably obvious to anyone in finance.