Consider the “Truth game”, played by an agent A:
A outputs a sequence of mathematical assertions S1, S2, …
For each statement S, A receives exp(-|S|) utilons.
If A makes any false statements (regardless of how many it makes) it receives -infinity utilons (or just a constant larger than the largest possible reward).