Mar 22, 2025 Current Resources

Why Tool AIs Want to Be Agent AIs (2016)

AIs limited to pure computation (Tool AIs) supporting humans, will be less intelligent, efficient, and economically valuable than more autonomous reinforcement-learning AIs (Agent AIs) who act on their own and meta-learn, because all problems are reinforcement-learning problems.

Autonomous AI systems (Agent AIs) trained using reinforcement learning can do harm when they take wrong actions, especially superintelligent Agent AIs. One solution would be to eliminate their agency by not giving AIs the ability to take actions, confining them to purely informational or inferential tasks such as classification or prediction (Tool AIs), and have all actions be approved & executed by humans, giving equivalently superintelligent results without the risk.

I argue that this is not an effective solution for two major reasons. First, because Agent AIs will by definition be better at actions than Tool AIs, giving an economic advantage. Secondly, because Agent AIs will be better at inference & learning than Tool AIs, and this is inherently due to their greater agency: the same algorithms which learn how to perform actions can be used to select important datapoints to learn inference over, how long to learn, how to more efficiently execute inference, how to design themselves, how to optimize hyperparameters, how to make use of external resources such as long-term memories or external software or large databases or the Internet, and how best to acquire new data.

RL is a terrible way to learn anything complex from scratch, but it is the least bad way to learn how to control something complex—and the world is full of complex systems we want to control, including AIs themselves.

All of these actions will result in Agent AIs more intelligent than Tool AIs, in addition to their greater economic competitiveness. Thus, Tool AIs will be inferior to Agent AIs in both actions and intelligence, implying use of Tool AIs is an even more highly unstable equilibrium than previously argued, as users of Agent AIs will be able to outcompete them on two dimensions (and not just one).

One proposed solution to AI risk is to suggest that AIs could be limited purely to supervised/unsupervised learning, and not given access to any sort of capability that can directly affect the outside world such as robotic arms. In this framework, AIs are treated purely as mathematical functions mapping data to an output such as a classification probability, similar to a logistic or linear model but far more complex; most deep learning neural networks like ImageNet image classification convolutional neural networks (CNN)s would qualify. The gains from AI then come from training the AI and then asking it many questions which humans then review & implement in the real world as desired. So an AI might be trained on a large dataset of chemical structures labeled by whether they turned out to be a useful drug in humans and asked to classify new chemical structures as useful or non-useful; then doctors would run the actual medical trials on the drug candidates and decide whether to use them in patients etc. Or an AI might look like Google Maps/Waze: it answers your questions about how best to drive places better than any human could, but it does not control any traffic lights country-wide to optimize traffic flows nor will it run a self-driving car to get you there. This theoretically avoids any possible runaway of AIs into malignant or uncaring actors who harm humanity by satisfying dangerous utility functions and developing instrumental drives. After all, if they can’t take any actions, how can they do anything that humans do not approve of?

Two variations on this limiting or boxing theme are

Oracle AI: Nick Bostrom, in Superintelligence (2014_11ya) (pg145–158) notes that while they can be easily ‘boxed’ and in some cases like P/NP problems the answers can be cheaply checked or random subsets expensively verified, there are several issues with oracle AIs:
- the AI’s definition of ‘resources’ or ‘staying inside the box’ can change as it learns more about the world (ontological crises)
- responses might manipulate users into asking easy (and useless problems)
- making changes in the world can make it easier to answer questions about, by simplifying or controlling it (“All processes that are stable we shall predict. All processes that are unstable we shall control.”)
- even a successfully boxed and safe oracle or tool AI can be misused¹
Tool AI (the idea, as “tool mode” or “tool AGI”, was apparently introduced by Holden Karnofsky in a July 2011_14ya discussion of a May 2011_14ya discussion with Jaan Tallinn & elaborated on in a May 2013_12ya essay, but the idea has probably been proposed before). To quote Karnofsky:

Google Maps—by which I mean the complete software package including the display of the map itself—does not have a “utility” that it seeks to maximize. (One could fit an utility function to its actions, as to any set of actions, but there is no single “parameter to be maximized” driving its operations.)

Google Maps (as I understand it) considers multiple possible routes, gives each a score based on factors such as distance and likely traffic, and then displays the best-scoring route in a way that makes it easily understood by the user. If I don’t like the route, for whatever reason, I can change some parameters and consider a different route. If I like the route, I can print it out or email it to a friend or send it to my phone’s navigation application. Google Maps has no single parameter it is trying to maximize; it has no reason to try to “trick” me in order to increase its utility. In short, Google Maps is not an agent, taking actions in order to maximize an utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.

Every software application I know of seems to work essentially the same way, including those that involve (specialized) artificial intelligence such as Google Search, Siri, Watson, Rybka, etc. Some can be put into an “agent mode” (as Watson was on Jeopardy) but all can easily be set up to be used as “tools” (for example, Watson can simply display its top candidate answers to a question, with the score for each, without speaking any of them.)…Tool-AGI is not “trapped” and it is not Unfriendly or Friendly; it has no motivations and no driving utility function of any kind, just like Google Maps. It scores different possibilities and displays its conclusions in a transparent and user-friendly manner, as its instructions say to do; it does not have an overarching “want,” and so, as with the specialized AIs described above, while it may sometimes “misinterpret” a question (thereby scoring options poorly and ranking the wrong one #1) there is no reason to expect intentional trickery or manipulation when it comes to displaying its results.

…Another way of putting this is that a “tool” has an underlying instruction set that conceptually looks like: “(1) Calculate which action A would maximize parameter P, based on existing data set D. (2) Summarize this calculation in a user-friendly manner, including what Action A is, what likely intermediate outcomes it would cause, what other actions would result in high values of P, etc.” An “agent,” by contrast, has an underlying instruction set that conceptually looks like: “(1) Calculate which action, A, would maximize parameter P, based on existing data set D. (2) Execute Action A.” In any AI where (1) is separable (by the programmers) as a distinct step, (2) can be set to the “tool” version rather than the “agent” version, and this separability is in fact present with most/all modern software. Note that in the “tool” version, neither step (1) nor step (2) (nor the combination) constitutes an instruction to maximize a parameter—to describe a program of this kind as “wanting” something is a category error, and there is no reason to expect its step (2) to be deceptive…This is important because an AGI running in tool mode could be extraordinarily useful but far more safe than an AGI running in agent mode. In fact, if developing “Friendly AI” is what we seek, a tool-AGI could likely be helpful enough in thinking through this problem as to render any previous work on “Friendliness theory” moot.

…Is a tool-AGI possible? I believe that it is, and furthermore that it ought to be our default picture of how AGI will work

There are similar general issues with Tool AIs as with Oracle AIs:
- a human checking each result is no guarantee of safety; even Homer nods. A extremely dangerous or subtly dangerous answer might slip through; Stuart Armstrong notes that the summary may simply not mention the important (to humans) downside to a suggestion, or frame it in the most attractive light possible. The more a Tool AI is used, or trusted by users, the less checking will be done of its answers before the user mindlessly implements it.²
- an intelligent, never mind superintelligent Tool AI, will have built-in search processes and planners which may be quite intelligent themselves, and in ‘planning how to plan’, discover dangerous instrumental drives and the sub-planning process execute them.³
  
  (This struck me as mostly theoretical until I saw how well GPT-3 could roleplay & imitate agents purely by offline self-supervised prediction on large text databases—imitation learning is (batch) reinforcement learning too! See Decision Transformer for an explicit use of this.)
- developing a Tool AI in the first place might require another AI, which itself is dangerous

Oracle AIs remain mostly hypothetical because it’s unclear how to write such utility functions. The second approach, Tool AI, is just an extrapolation of current systems but has two major problems aside from the already identified ones which cast doubt on Karnofsky’s claims that Tool AIs would be “extraordinarily useful” & that we should expect future AGIs to resemble Tool AIs rather than Agent AIs.

Economic

We wish a slave to be intelligent, to be able to assist us in the carrying out of our tasks. However, we also wish him to be subservient. Complete subservience and complete intelligence do not go together.

Norbert Wiener1960

First and most commonly pointed out, agent AIs are more economically competitive as they can replace tool AIs (as in the case of YouTube upgrading from next-video prediction to REINFORCE ⁴) or ‘humans in the loop’.⁵ In any sort of process, Amdahl’s law notes that as steps get optimized, the optimization does less and less as the output becomes dominated by the slowest step—if a step only takes 10% of the time or resources, then even infinite optimization of that step down to zero time/resources means that the output will increase by no more than 10%. So if a human overseeing a, say, high-frequency trading (HFT) algorithm, accounts for 50% of the latency in decisions, then the HFT algorithm will never run more than twice as fast as it does now, which is a crippling disadvantage. (Hence, the Knight Capital debacle is not too surprising—no profitable HFT firm could afford to put too many humans into its loops, so when something does go wrong, it can be difficult for humans to figure out the problem & intervene before the losses mount.) As the AI gets better, the gain from replacing the human increases greatly, and may well justify replacing them with an AI inferior in many other respects but superior in some key aspect like cost or speed. This could also apply to error rates—in airline accidents, human error now causes the overwhelming majority of accidents due to their presence as overseers of the autopilots and it’s unclear that a human pilot represents a net safety gain; and in ‘advanced chess’, grandmasters initially chose most moves and used the chess AI for checking for tactical errors and blunders, which transitioned through the late ‘90s and early ’00s to human players (not even grandmasters) turning over most playing to the chess AI but contributing a great deal of win performance by picking & choosing which of several AI-suggested moves to use, but as the chess AIs improved, at some point around 2007_18ya victories increasingly came from the humans making mistakes which the opposing chess AI could exploit, even mistakes as trivial as ’misclicks’ (on the computer screen), and now in advanced chess, human contribution has decreased to largely preparing the chess AIs’ opening books & looking for novel opening moves which their chess AI can be better prepared for.

At some point, there is not much point to keeping the human in the loop at all since they have little ability to check the AI choices and become ‘deskilled’ (think drivers following GPS directions), correcting less than they screw up and demonstrating that toolness is no guarantee of safety nor responsible use. (Hence the old joke: “the factory of the future will be run by a man and a dog; the dog will be there to keep the man away from the factory controls.”) For a successful autonomous program, just keeping up with growth alone makes it difficult to keep humans in the loop; the US drone warfare program has become such a central tool of US warfare that the US Air Force finds it extremely difficult to hire & retain enough human pilots overseeing its drones, and there are indications that operational pressures are slowly eroding the human control & turning them into rubberstamps, and for all its protestations that it would always keep a human in the decision-making loop, the Pentagon is, unsurprisingly, inevitably, sliding towards fully autonomous drone warfare as the next technological step to maintain military superiority over Russia & China. (See “Meet The New Mavericks: An Inside Look At America’s Drone Training Program”; “Future is assured for death-dealing, life-saving drones”; “Sam Altman’s Manifest Destiny”; “The Pentagon’s ‘Terminator Conundrum’: Robots That Could Kill on Their Own”; “Attack of the Killer Robots”. Despite fervent asseveration that the US military would never use fully autonomous drones, within a few years, by 2019, Pentagon whitepapers had begun to walk that back and talk about autonomous weapons that were merely auditable post hoc and laying out AI ethics principles like being “equitable”.)

Fundamentally, autonomous agent AIs are what we and the free market want; everything else is a surrogate or irrelevant loss function. We don’t want low log-loss error on ImageNet, we want to refind a particular personal photo; we don’t want excellent advice on which stock to buy for a few microseconds, we want a money pump spitting cash at us; we don’t want a drone to tell us where Osama bin Laden was an hour ago (but not now), we want to have killed him on sight; we don’t want good advice from Google Maps about what route to drive to our destination, we want to be at our destination without doing any driving etc. Idiosyncratic situations, legal regulation, fears of tail risks from very bad situations, worries about correlated or systematic failures (like hacking a drone fleet), and so on may slow or stop the adoption of Agent AIs—but the pressure will always be there.

So for this reason alone, we expect to see Agent AIs to systematically be preferred over Tool AIs unless they’re considerably worse.

Intelligence

They passed a steam engine, and Wordsworth made some observation to the effect that it was scarcely possible to divest oneself of the impression on seeing it that it had life and volition. ‘Yes’, replied Coleridge, ‘it is a giant with one idea.’

Diary of Lady Richardson⁶

Why will people choose agents? Agent AIs will be chosen over Tool AIs because agents are what users want, lack of agency is something that will be penalized in competitive scenarios such as free markets or military uses, and because people will differ on preferences and some will inevitably choose to use agents.

More importantly, in addition to those reasons, it is probable that, because everything is a decision problem where agency is useful, the best Tool AI’s performance/intelligence will be equal to or worse than the best Agent AI, probably worse, and possibly much worse. Bostrom notes that “Such ‘creative’ [dangerous] plans come into view when the [Tool AI] software’s cognitive abilities reach a sufficiently high level.” We might reverse this to say that to reach a Tool AI of sufficiently high level, we must put such creativity in view. (A linear model may be extremely safe & predictable, but it would be hopeless to expect everyone to use them instead of neural networks.)

An Agent AI clearly benefits from being a better Tool AI, so it can better understand its environment & inputs; but less intuitively, any Tool AI benefits from agentiness. An Agent AI has the potential, often realized in practice, to outperform any Tool AI: it can get better results with less computation, less data, less manual design, less post-processing of its outputs, on harder domains.

(Trivial proof: Agent AIs are supersets of Tool AIs—an Agent AI, by not taking any actions besides communication or random choice, can reduce itself to a Tool AI; so in cases where actions are unhelpful, it performs the same as the Tool AI, and when actions can help, it can perform better; hence, an Agent AI can always match or exceed a Tool AI. At least, assuming sufficient data that in the environments where actions are not helpful, it can learn to stop acting, and in the ones where they are, it has a distant enough horizon to pay for the exploration. Of course, you might agree with this but simply believe that intelligence-wise, Agent AIs == Tool AIs.)

Because reinforcement learning can solve all your problems, it is rarely the best solution—but every sufficiently hard problem becomes a reinforcement learning problem.

For example, not all data is created equal. Not all data points are equally valuable to learn from, require equal amounts of computation, should be treated identically, should inspire identical followup data sampling, or actions. Inference and learning can be much more efficient if the algorithm can choose how to compute on what data with which actions.

There is no hard Cartesian boundary between an algorithm & its environment such that control of the environment is irrelevant to the algorithm and vice-versa and its computation can be carried out without regard to the environment—there are simply many layers between the core of the algorithm and the furthest part of the environment, and the more layers that the algorithm can model & control, the more it can do. Consider Google Maps/Waze⁷. On the surface they are ‘merely’ Tool AIs which produce lists of possible routes which would optimize certain requirements; but the entire point of such Tool AIs—and all large-scale Tool AIs and research in general—is that countless drivers will act on them (what’s the point of getting driving directions if you don’t then drive?), and this will greatly change traffic patterns as drivers become appendages of the ‘Tool’ AI, potentially making driving in an area much worse by their errors or myopic per-driver optimization causing Braess’s paradox (and far from being a theoretical curiosity, GPS, Google Maps, and Waze are regularly accused of that in many places, especially Los Angeles).

This is a highly general point which can be applied on many levels. This point often arises in classical statistics/experimental design/decision theory where adaptive techniques can greatly outperform fixed-sample techniques for both inference and actions/losses: numerical integration can be improved, a sequential analysis trial testing a hypothesis can often terminate after a fraction of the equivalent fixed-sample trial’s sample size (and/or loss) while exploring multiple questions; an adaptive multi-armed bandit will have much lower regret than any non-adaptive solution, but it will also be inferentially better at estimating which arm is best and what the performance of that arm is (see the ‘best-arm problem’: Bubeck et al 2009, Audibert et al 2010, Gabillon et al 2011, Mellor2014, Jamieson & Nowak2014, Kaufmann et al 2014), and an adaptive optimal design can constant-factor (gains of 50% or more are possible compared to naive designs like even allocation; McClelland1997) minimize total variance by focusing on unexpectedly difficult-to-estimate arms (while a fixed-sample trial can be seen as ideal for when one values precise estimates of all arms equally and they have equal variance, which is usually not the case); even a Latin square or blocking or rerandomization design rather than simple randomization can be seen as reflecting this benefit (avoiding the potential for imbalance in allocation across arms by deciding in advance the sequence of ‘actions’ taken in collecting samples). Another example comes from queueing theory’s “the power of two choices”, where selecting the best of 2 possible queues to wait in rather than selecting 1 queue at random improves the expected maximum delay from ?(log n)/(log log n) to instead ?(log log n)/(log d) (and interestingly, almost all the gain comes from being able to make any choice at all, going 1 → 2—choosing from 3 or more queues adds only some constant-factor gains).

The wide variety of uses of action is a major theme in recent work in AI (specifically, deep learning/neural networks) research and increasingly key to achieving the best performance on inferential tasks as well as reinforcement learning/optimization/agent-y tasks. Although these advantages apply to most AI paradigms, because of the power and wide variety of tasks NNs get applied to, and sophisticated architectures, we can see the pervasive advantage of agentiness much more clearly than in narrower contexts like biostatistics.

Actions for Intelligence

Roughly, we can try to categorize the different kinds of agentiness by the ‘level’ of the NN they work on. There are:

actions internal to a computation:
- inputs
- intermediate states
- accessing the external ‘environment’
- amount of computation
- enforcing constraints/finetuning quality of output
- changing the loss function applied to output
actions internal to training the NN:
- the gradient itself
- size & direction of gradient descent steps on each parameter
- overall gradient descent learning rate and learning rate schedule
- choice of data samples to train on
internal to the dataset
- active learning
- optimal experiment design
internal to the NN design step
- hyperparameter optimization
- NN architecture
internal to interaction with environment
- adaptive experiment / multi-armed bandit / exploration for reinforcement learning

Actions Internal to a Computation

Inside a specific NN, while computing the output for an input question, a NN can make choices about how to handle it.

It can choose what parts of the input to run most of its computations on, while throwing away or computing less on other parts of the input, which are less relevant to the output, using “attention mechanisms” (eg. Olah & Carter2016, Hahn & Keller2016, Bellver et al 2016, Mansimov et al 2015, Gregor et al 2015, Xu2015, Larochelle & Hinton2010, Bahdanau et al 2015, Ranzato2014, Mnih et al 2014, Sordoni et al 2016, Kaiser & Bengio2016). Attention mechanisms are responsible for many increases in performance, but especially improvements in RNNs’ ability to do sequence-to-sequence translation by revisiting important parts of the sequence (Vaswani et al 2017), image generation and captioning, and in CNNs’ ability to recognize images by focusing on ambiguous or small parts of the image, even for adversarial examples (Luo et al 2016). They are a major trend in deep learning, as it is often the case that some parts of the input are more important than others and enable both global & local operations to be learned, with increasingly too many examples of attention to list (with a trend as of 2018 towards using attention as the major or only construct).

Many designs can be interpreted as using attention. The bidirectional RNN also often used in natural language translation doesn’t explicitly use attention mechanisms but is believed to help by giving the RNN a second look at the sequence. Indeed, so universal that it often goes without mention is that the LSTM/GRU mechanism which improves almost all RNNs is itself a kind of attention mechanism: the LSTM cells learn which parts of the hidden state/history are important and should be kept, and whether and when the memories should be forgotten and fresh memories loaded into the LSTM cells. While LSTM RNNs are the default for sequence tasks, they have occasionally been beaten by feedforward neural networks—using internal attention or “self-attention”, like the Transformer architecture (eg. Vaswani et al 2017 or Al-Rfou et al 2018).

Extending attention, a NN can choose not just which parts of an input to look at multiple times, but also how long to keep computing on it, “adaptive computation” (Graves2016a, Figurnov et al 2016, Silver et al 2016b, Zamir et al 2016, Huang et al 2017, Li et al 2017, Wang et al 2017, Teerapittayanon et al 2017, Huang et al 2017, Li et al 2017b, Campos et al 2017, McGill & Perona2017, Bolukbasi et al 2017, Wu et al 2017, Seo et al 2017, Lieder et al 2017, Dehghani et al 2018, Buesing et al 2019,