Will machine intelligences communicate with humans by directly exposing or reporting properties of their internal state, or will they tend to communicate by strategically choosing utterances that they think will have the intended effect on the listener? In this post I try to lay out the distinction more clearly and describe some differences.
(Disclaimer: I expect this is a distinction that has been explicated elsewhere, but I’m not aware of it. Pointers are appreciated. I’m mostly writing this post because it is a distinction I want to make use of in upcoming posts.)
If I ask someone “is there a book on the floor?” I expect them to answer the question by translating the words into a proposition in their internal language of thought, and then to translate their beliefs about that proposition back into words.
By contrast, I don’t typically expect them to listen to my question, use it to infer something about my mental state, backwards chain from their life goals, and then find an utterance which will lead me to respond in the way that best suits their ends. Clearly some amount of goal-oriented reasoning is present: for example, a good communicator tries to understand how their listener is thinking and chooses responses that will be understood or that will engender goodwill. But I think of it as a process which can modulate the much easier “straightforward” procedure, rather than the main dynamic.
I think most researchers expect machine intelligence to work in the same way: to translate a natural language utterance into a proposition in an internal language of thought, to asses that proposition, and to translate its beliefs back into language.
By contrast, I think most researchers don’t expect a question-answering system to listen to my question, use it to infer something about my mental state, backwards chain from some ultimate goal (such as user satisfaction), and try to find the utterance which will best satisfy that goal. I think that most researchers would agree that some amount of this goal-oriented reasoning is necessary to really match human-level question-answering performance. But again, it is seen as a process which modulates the “straightforward” procedure, rather than the main dynamic.
Let’s call the first of these two procedures “straightforward communication” and the second “goal-oriented communication.” Of course there is also a wide range of behaviors that are intermediate between these two, but I think there are two fundamentally different forces at work which can lead to useful communication.
A similar distinction is at work on the listening side. If I hear what you say, I can respond in two different ways: I can either directly translate your utterance into a logical form (or whatever) and manipulate that logical form, or I can treat your utterance as evidence and try to figure out what characteristics of your mental state would have led you to make that utterance. For example, you would say “I saw Bob this morning” if you have the memory of having seen Bob, which is most likely if in fact you saw Bob–and I can perform these inferences regardless of how different my internal model of the world is to yours.
- Successful goal-oriented communication requires some common values between the speaker and the listener. If the speaker doesn’t want the listener to understand them, then there is no hope for goal-oriented communication.
- For straightforward communication, a “near-miss” is most likely to lead to unclarity or a complete failure to communicate. For goal-oriented communication a “near-miss” in which the speaker’s goals diverge from those of the listener can lead to more perverse failures: the speaker may deliberately mislead or manipulate the listener, tell them what they want to hear, etc., without observable indicators of failure. Moreover, a goal-oriented communicator which works well in some contexts may fail to work well in other contexts, since their decisions to communicate accurately is contingent on the belief that accurate communication is useful.
- Successful straightforward communication requires the speaker and the listener to have a sufficiently similar internal representation, or to have an explicit procedure for converting between them. Even if the speaker could predict that language is being used in a different way or that the listener won’t understand a sentence, by default this wouldn’t lead them to change their behavior, since they are simply relaying their thoughts rather than strategically choosing language to achieve a goal (such as understanding).
- Designing a goal-oriented communicator is conceptually straightforward, at least if we set aside the severe difficulties posed by resource limitations. Indeed, communication follows naturally from the desire to coordinate with or manipulate other individuals, and a “smart enough” goal-oriented agent of almost any kind would develop communication if it were useful for their goals. But it is not immediately clear how you would design a straightforward communicator at all.
- Goal-oriented communication naturally adjusts to different listeners given a strong enough underlying reasoning process, while straightforward communication does not adjust at all without some explicit additional provisions.
- Goal-oriented communication naturally replicates many features of human communication:
- tending to use unambiguous language in unfamiliar situations, for surprising info, or in the presence of noise
- using language (e.g. adjectives) in a context-dependent way
- leveraging the assumption that the speaker is trying to be informative
- employing poetic, elegant, or inspiring language, and so on.
- Straightforward communication might be used to communicate goals to an agent, while for goal-oriented communication this appears to beg the question.
“Communication” here may mean something broader than natural language. For example, I may want to inspect the internal representations being used by a program. I might do this by having some straightforward procedure for translating those internal representations into something that I can understand. But this probably requires a “good enough” alignment between the internal representation of the program and the way that I think about the world. Alternatively, I might try to build an agent which strategically structures this information in a way which makes it understandable by me.
For a variety of reasons, particularly the desire to build systems that fail gracefully, I am interested in understanding the feasibility of straightforward communication. That said, I think that the list of differences above mostly suggests to me that goal-oriented communication is a better default for powerful systems. It seems to me that humans are goal-oriented communicators, though we use our common language and cognitive architecture as a (significant) computational expedient which leads us in practice to often communicate straightforwardly (and even to systematically delude ourselves rather than to communicate strategically). Today question-answering systems are mostly straightforward communicators.
There are very simple formal models of goal-oriented communicators (e.g. any formal model of prediction & planning). At the moment, I’m not aware of good formal models for learning a “transparent” representation that could potentially be understood by humans, even neglecting resource limitations. I think there are a lot of plausible approaches to getting traction on this, but I think none of them are yet particularly satisfying. I don’t know how much more work would be required to find satisfying solutions.
If we think that straightforward communication would be a useful tool for building robustly useful AI, this seems like it might be a worthwhile problem to work on: (1) it might directly improve our ability to make useful AI, and it might be better to work on this problem further in advance of the arrival of AI, (2) it might ultimately prove useful for overcoming computational limitations, if we think that human behavior is a good model for tractable intelligence, (3) it might improve our ability to reason about and discuss future AI.