Clarification of AI Reflection Problem

(Cross-posted from lesswrong)

Consider an agent A, aware of its own embedding in some lawful universe, able to reason about itself and use that reasoning to inform action.  By interacting with the world, A is able to modify itself or construct new agents, and using these abilities effectively is likely to be an important component of AGI.  Our current understanding appears to be inadequate for guiding such an agent’s behavior, for (at least) the following reason:

If A does not believe “A’s beliefs reflect reality,” then A will lose interest in creating further copies of itself, improving its own reasoning, or performing natural self-modifications. Indeed, if A’s beliefs don’t reflect reality then creating more copies of A or spending more time thinking may do more harm than good. But if A does believe “A’s beliefs reflect reality,” then A runs immediately into Gödelian problems: for example, does A become convinced of the sentence Q = “A does not believe Q”? We need to find a way for A to have some confidence in its own behavior without running into these fundamental difficulties with reflection.

This problem has been discussed occasionally at Less Wrong, but I would like to clarify and lay out some examples before trying to start in on a resolution.

Continue reading