# “Proof” of Friendliness

The humans are about to launch their best effort at a friendly singularity. Of course, they are careful and wise and have exceeded all reasonable expectations for caution and rigor.

Before building FAI you built an oracle AI to help you. With its help, you found a mathematical definition of U, the utility of humanity’s extrapolated volition (or whatever). You were all pretty pleased with yourselves, but you didn’t stop there: you found a theory of everything, located humanity within it, and wrote down the predicate F(X) = “The humans run the program described by X.”

To top it off, with the help of your oracle AI you found the code for a “best possible AI”, call it FAI, and a proof of the theorem:

There exists a constant Best such that U ≤ Best, but F(FAI) implies U =  Best.”

Each of these steps you did with incredible care. You have proved beyond reasonable doubt that U and F represent what you want them to.

You present your argument to the people of the world. Some people object to your reasoning, but it is airtight: if they choose to stop you from running FAI, they will still receive U ≤ Best, so why bother?

Now satisfied and with the scheduled moment arrived, you finally run FAI. Promptly the oracle AI destroys civilization and spends the rest of its days trying to become as confident as possible that Tic-Tac-Toe is really a draw (like you asked it to, once upon a time).

Just a lighthearted illustration that decision theory isn’t only hard for AI.

(Disclaimer: this narrative claims to represent reality only insofar as it is mathematically plausible.)

Edit: I think the moral was unclear. The point is: in fact F(FAI), and so in fact U = Best so U ≤ Best. Everything was as claimed and proven. But this doesn’t change the fact that you would rather not use this procedure.