Consider the “Truth game”, played by an agent A:
A outputs a sequence of mathematical assertions S1, S2, …
For each statement S, A receives exp(-|S|) utilons.
If A makes any false statements (regardless of how many it makes) it receives -infinity utilons (or just a constant larger than the largest possible reward).
We can view A’s output in this game as an operationalization of “facts about which A is certain.” I believe that some intuitive judgments about “belief” are probably poorly grounded, and that thinking about a more concrete situation can patch some of these issues.
Naturally, it seems like A’s output should obey certain axioms. For example:
- If A outputs X and Y, then A outputs “X and Y.”
- If A outputs X and “X implies Y”, then A outputs Y. (“Modus Ponens”)
- If A outputs X, then A outputs “X or Y.”
And so on. For example, we could let A output some axioms in first order logic and then all of their logical consequences: then the set of statements output by A are precisely the theorems for a certain first order theory.
Humans have a strong intuitive feeling that they can “go beyond” any particular first order theory, in a sense formalized by Goedel’s incompleteness results. In particular, humans can feel pretty confident about ‘Anything that humans feel pretty confident about is true,’ which is impossible if their confidence were only justified by provability in some formal system.
But let’s consider agent A playing the Truth game. Should A output a statement of the form “Everything A outputs is true”? Well, if the game is played in some reasonable language (say, first order statements about the integers) then A probably can’t articulate this sentence. But it has some good-enough analogs, like “If A outputs a statement S, A doesn’t output (not S)” which are fair game. Should A output these sentences?
If A ever outputs S and (not S), then it doesn’t matter what A does–it is getting -infinity utilons anyway. So A might as well assert that A never outputs both S and not(S). Let this statement be Con(A).
Another natural class of statements for A to output are the results of finite computations: if phi is a delta_0 formula (ie, if we can determine whether phi(x) is true using a deterministic computation) then any true statement of the form “There is some x such that phi(x)” should get output by A eventually–after all, eventually A might as well try every possible candidate x, and if any of them work A can promptly output “There is some x such that phi(x).” Call this statement Exhaustive(A). It seems clear that A might as well output Exhaustive(A).
Finally, it seems like A should be able to output “X implies Y” when there is a simple (say, strict finitist) proof that X implies Y. I’m not going to dwell on this because it doesn’t seem like it is either controversial nor actually problematic. Call this statement Prover(A).
Unfortunately, we are about to get into some trouble.
Consider the statement G = “A never outputs G.” It is easy for A to reason as follows:
If A outputs G, then by simulating A, A will eventually output ‘A outputs G.’ Thus A will have output G and (not G). Thus not Con(A).
So if Prover(A), then A outputs “If A outputs G, then not Con(A)” which is the same statement as “If not G, then not Con(A)” which is in turn the same as “Con(A) implies G.” We’ve already argued that A might as well output “Con(A)”. So if we accept modus ponens we are saying that A might as well output “G”, in which case A will certainly receive infinite negative utility.
Which one of these legs should be dropped? Should A fail to output Con(A), should A fail to output Exhaustive(A), or should we abandon Prover(A) or modus ponens? If you were in A’s position you almost certainly wouldn’t output G. Which would you drop?
I would drop modus ponens, and I suspect that this is the path towards a satisfying theoretical resolution of the problem. When we view mathematical truth as fixed in the background, with A simply trying to discover it, modus ponens is easily justified (and decision-theoretic considerations make no difference). But A is itself made of math, and this seems like an extremely confused perspective, which we might seriously clarify mathematical logic by modifying (or we might not). A TDT agent in A’s shoes seems likely to output Con(A) and “Con(A) implies G” without outputting G, realizing that although G is true outputting it still isn’t advisable.
I will discuss this more in posts to come, but I should say that the real problem to me at this point seems to be: if you don’t have modus ponens, how do you do any reasoning at all?
(Also, let me say in passing that this analysis will carry over quite directly to the case of agents manipulating subjective uncertainties.)