I have written briefly about how one might pin down the human decision process (the thing itself, not some idealization thereof) or a counterfactual world. If we (probably foolishly) wanted to give an AI formal instructions using these ideas, we would still need to include some edict like “Now take this decision process, embed it in this abstract world (where we believe it will be able to create a flourishing utopia or whatever) in this way, and make the universe look like that.” We’ve maybe gotten some leverage on the first parts (though right now the difficulties here loom pretty large), which involve precisely defining certain concepts for an AI, but it isn’t yet clear how you could precisely tell the AI to do something. Here is a stab at this other problem.
Rather than directly asking an AI to simulate a particular universe, we will ask it to find the value on a particular physical input channel in that universe, and then exert control from within the universe to ensure that calculating this value requires simulating the universe (or at least capturing whatever moral value we hope would come from a simulation of that universe). Continue reading