Getting decision theory right seems to be an important step towards comprehensible AI, of the sort that might be described as maximizing something which its creator understands or (hopefully) chooses. This seems important in itself, but understanding decision theory is also valuable for avoiding a variety of decision-theoretic hazards which might lead to unanticipated behavior. Continue reading
Previously I’ve talked about getting a handle on objects of interest (humans, counterfactual civilizations) and on issuing an instruction of the form “Simulate this civilization.” Here is a much better proposal for issuing formal instructions.
Suppose we can build a something-like-TDT agent, which controls its behavior to maximize a constant U defined by some axioms (or as the output of some program). We want to run this agent with a utility function reflecting our own preferences, but we don’t have (formal) access to those preferences. Continue reading