Consider the “Truth game”, played by an agent A:

A outputs a sequence of mathematical assertions S1, S2, …

For each statement S, A receives exp(-|S|) utilons.

If A makes any false statements (regardless of how many it makes) it receives -infinity utilons (or just a constant larger than the largest possible reward).

We can view A’s output in this game as an operationalization of “facts about which A is certain.” I believe that some intuitive judgments about “belief” are probably poorly grounded, and that thinking about a more concrete situation can patch some of these issues.

Naturally, it seems like A’s output should obey certain axioms. For example:

- If A outputs X and Y, then A outputs “X and Y.”
- If A outputs X and “X implies Y”, then A outputs Y. (“Modus Ponens”)
- If A outputs X, then A outputs “X or Y.”

And so on. For example, we could let A output some axioms in first order logic and then all of their logical consequences: then the set of statements output by A are precisely the theorems for a certain first order theory.

Humans have a strong intuitive feeling that they can “go beyond” any particular first order theory, in a sense formalized by Goedel’s incompleteness results. In particular, humans can feel pretty confident about ‘Anything that humans feel pretty confident about is true,’ which is impossible if their confidence were only justified by provability in some formal system.

But let’s consider agent A playing the Truth game. Should A output a statement of the form “Everything A outputs is true”? Well, if the game is played in some reasonable language (say, first order statements about the integers) then A probably can’t articulate this sentence. But it has some good-enough analogs, like “If A outputs a statement S, A doesn’t output (not S)” which are fair game. Should A output these sentences?

If A ever outputs S and (not S), then it doesn’t matter what A does–it is getting -infinity utilons anyway. So A might as well assert that A never outputs both S and not(S). Let this statement be Con(A).

Another natural class of statements for A to output are the results of finite computations: if phi is a delta_0 formula (ie, if we can determine whether phi(x) is true using a deterministic computation) then any true statement of the form “There is some x such that phi(x)” should get output by A eventually–after all, eventually A might as well try every possible candidate x, and if any of them work A can promptly output “There is some x such that phi(x).” Call this statement Exhaustive(A). It seems clear that A might as well output Exhaustive(A).

Finally, it seems like A should be able to output “X implies Y” when there is a simple (say, strict finitist) proof that X implies Y. I’m not going to dwell on this because it doesn’t seem like it is either controversial nor actually problematic. Call this statement Prover(A).

Unfortunately, we are about to get into some trouble.

Consider the statement G = “A never outputs G.” It is easy for A to reason as follows:

If A outputs G, then by simulating A, A will eventually output ‘A outputs G.’ Thus A will have output G and (not G). Thus not Con(A).

So if Prover(A), then A outputs “If A outputs G, then not Con(A)” which is the same statement as “If not G, then not Con(A)” which is in turn the same as “Con(A) implies G.” We’ve already argued that A might as well output “Con(A)”. So if we accept modus ponens we are saying that A might as well output “G”, in which case A will certainly receive infinite negative utility.

Which one of these legs should be dropped? Should A fail to output Con(A), should A fail to output Exhaustive(A), or should we abandon Prover(A) or modus ponens? If you were in A’s position you almost certainly wouldn’t output G. Which would you drop?

I would drop modus ponens, and I suspect that this is the path towards a satisfying theoretical resolution of the problem. When we view mathematical truth as fixed in the background, with A simply trying to discover it, modus ponens is easily justified (and decision-theoretic considerations make no difference). But A is itself made of math, and this seems like an extremely confused perspective, which we might seriously clarify mathematical logic by modifying (or we might not). A TDT agent in A’s shoes seems likely to output Con(A) and “Con(A) implies G” without outputting G, realizing that although G is true outputting it still isn’t advisable.

I will discuss this more in posts to come, but I should say that the real problem to me at this point seems to be: if you don’t have modus ponens, how do you do any reasoning at all?

(Also, let me say in passing that this analysis will carry over quite directly to the case of agents manipulating subjective uncertainties.)

Thanks for starting a discussion about this. Great explication!

I appreciate you clearly building up the logical paradox and then working through the implied “bullet biting” someone would need to do to make things consistent again. Discarding modus ponens feels quite radical but I’m open to considering it and looking forward to your next post(s) on the subject.

One thought: Can you think of a less contrived statements in the form of G where the statement isn’t self-referential but the reasoning about it still leads to the same dilemma? The current construction feels like it’s relying on a bug in levels of description. In other words, this feels like something that could be solved by the same kind of symbol renaming that most compilers use to avoid collisions in different scope levels — even when they aren’t declared scope levels — so the compiler has to infer them. But this could be bigger problem. How would re-scoping fail as a general tactic to deal with this?

Sorry to offer such a naive solution. It feels like I’m possibly missing an essential part of your argument because it’s unlikely that you wouldn’t have considered a re-scoping scheme already. Still, I’m curious what your thoughts are on this line of reasoning anyway.

The issue is that expressive languages do allow an implicit sort of self-reference by quining; the “type-safeness” necessary to eliminate the possibility also eliminates the computational expressiveness of the language (see e.g. the typed lambda calculus).

A standard example in English is the sentence:

“Is not output by the agent when preceded by its quotation” is not output by the agent when preceded by its quotation.

There are examples all over mathematics. For example, there are functions f and g defined as compositions of +, *, cos, exp, such that f(x) = g(x) if and only if the agent never outputs “f(x) = g(x)”.

I describe the sentence as G = ” … G … ” only for ease of expression.