School of Computing. Dublin City University.
My big idea: Ancient Brain
Proc. 7th Int. Conf. on Simulation of Adaptive Behavior (SAB-02). See full reference.
The World-Wide-Mind (WWM) was introduced in [Humphrys, 2001]. For a short introduction see [Humphrys, 2001a]. Briefly, this is a scheme for putting animat "minds" online (as WWM "servers") so that large complex minds may be constructed from many remote components. The aim is to address the scaling up of animat research, or how to construct minds more complex than could be written by one author (or one research group).
The first part of this paper describes how a number of existing animat architectures could be implemented as WWM servers. Any unified mind can easily map to a single WWM server. So most of the discussion here is on action selection (or behavior or goal selection), where each module could be a different WWM server (written by a different author).
The second part of this paper describes the first implementation of WWM servers and clients, and explains in particular how to write a WWM server. Most animats researchers are programmers but not network programmers. Almost all protocols for remote services (CORBA, SOAP, etc.) assume the programmer is a networks specialist. This paper rejects these solutions, and shows how any animats researcher can put their animat "mind" or "world" online as a server by simply converting it into a command-line program that reads standard input and writes to standard output.
"AI" refers here to all artificial modelling of life, animals and humans. In the sense in which we use it, classic symbolic AI, sub-symbolic AI, Animats, Agents and ALife are all subfields of "AI".
It is generally agreed that the AI problem is much harder than researchers used to think (though it is not clearly understood why). Early optimism has given way to a sober respect for the magnitude of the problem, and a number of approaches have evolved:
It may be time to ask questions about how the animats and evolutionary approaches scale up. Both seem to share an implicit assumption that one lab can do it all. As a result, the complexity of the minds produced is limited to the complexity that can be grasped by a single research team (or even a single individual). Perhaps the Cog project [Brooks, 1997, Brooks et al., 1998] is beginning to hit the limits of what a single coherent team can understand.
This is, of course, the problem of building whole minds out of specialist components that standard AI never solved, and we suggest why. We argue that only by using a public, open system on the Internet as the infrastructure on which to build the mind can this problem be solved. We do not suggest the abolition of the animats approach, but rather modifying it to building simple whole creatures out of components written by multiple authors, and scale up to building complex whole creatures out of components written by multiple authors.
The "not enough authors" approach is the one that has not yet been tried. Many researchers have emphasised the vast and heterogenous nature of the mind, notably Minsky [Minsky, 1986, Minsky, 1991]. In the Animats world it is at least accepted that complex minds will have "Action Selection" among competing sub-minds. So far, the case has been made for heterogenous minds, but no one has shown how to build really heterogenous minds.
A user "runs" a Mind server in a World server, using some dedicated client software. The user is typically remote from both Mind and World, and starts the client by giving it the (remote) World URL and Mind URL. The client then (repeatedly) queries the World for state, passes this to the Mind to get a suggested action, sends this to the World for execution, and so on.
What may prevent complete chaos, however, is first, that popular worlds will serve as benchmarks for testing. "Islands" of compatible worlds and minds may develop around each popular basic problem. Also, we can define a client that will work with all World servers and Mind servers. And finally, it may be the case that Minds can be written that will run in all worlds, or at least in a lot of quite different worlds. For example, one could write a generic Q-learning Mind server [Watkins, 1989]. When set to run in a new World server, it queries the world to learn that it has a finite number of states, numbered state 1 to state n, and a finite number of actions, action 1 to action m, and the World server will occasionally generate a numeric reward after an action has been taken. The Q-learning Mind server can then attempt to learn a policy without knowing anything more about what the world represents or what the problem is.
A Subsumption Architecture model [Brooks, 1986, Brooks, 1991] could be implemented as a hierarchy of MindM servers, each one building on the ones below it. Each one sends the current state x to the server below it, and then either uses their output or overrides it. As in Brooks' model, a set of lower layers will still work if the higher layers are removed. On the WWM, there may be many choices for (remote, 3rd party) higher layers to add to a given collection of lower layers.
An ordinary Reinforcement Learning (RL) agent, which receives rewards and punishments as it acts [Kaelbling et al., 1996], can clearly be implemented as a single Mind server. For example a Q-learning agent [Watkins, 1989] builds up Q-values ("Quality"-values) of how good each action is in each state: Q(x,a). When learning, it can calculate a reward based on x, a and the new state y. So, so long as the client informs this Mind server what state y resulted from the previous action a, it can calculate rewards, and learn.
Hierarchical Q-Learning [Lin, 1993] is a way of driving multiple Q-learners with a master Q-learner. It can be implemented on the WWM as follows. The client talks to a single MindAS server, sending it x and receiving a. The MindAS server talks to a number of Mind servers. The MindAS server maintains a table of values Q(x,i) where i is which Mind server to pick in state x. Initially its choices are random, but by its own reward function, the MindAS server fills in values for Q(x,i). Having chosen i, it passes on the action suggested by Mind server i to the client. To save on the number of server queries (which is a more serious issue on the WWM than in a self-contained system), the MindAS server does not query any of the Mind servers until it has picked an action i, and then it only queries a single Mind server i. There are a number of interesting possibilities:
Hierarchical Q-Learning is an ASo server.
We consider a number of schemes where Mind servers promote their actions with a weight W, or "W-value" [Humphrys, 1997]. Ideally the W-value will depend on the state x and will be higher or lower depending on how much the Mind server "cares" about winning the competition for this state. A static measure of the W-value is one in which the Mind server promotes its action with a value of W based on internal reasons, independent of the competition. Any such method (including, say, W=Q) can clearly be implemented as a Mindi server. A dynamic measure of W is one in which the value of W changes depending on whether the Mind server was obeyed, and on what happened if it was not obeyed. Clearly this is an ASs server that queries once, lets through the highest W, and then reports back afterwards to each Mind server whether or not it was obeyed. The server may then modify its W-value next time round in this state.
W-learning [Humphrys, 1996] is a form of dynamic W where W is modified based on (i) whether we were obeyed or not, and (ii) what the new state y is as a result. This can clearly be implemented as an ASs server. In the pure form of W-learning the Minds do not even share the same suite of actions, and so, for example, cannot simply get together and negotiate to find the optimum action. The inspiration was simply to see if competition could be resolved between Minds that had as little in common as possible. That work was unable to give convincing examples where this might arise. Now with the WWM, we hope it is clearer what the usefulness of this is. This is the kind of model we need where parts of the mind have difficulty understanding each other (e.g. could be written by different authors).
If Minds do share the same suite of actions, then we can make various global decisions. Say we have n Mind servers. Mind server i's preferred action is action ai. Mind server i can quantify "how good" action a is in state x by returning: Qi(x,a), and can quantify "how bad" action a is in state x by returning: Qi(x,ai) - Qi(x,a). Then we have 4 basic approaches [Humphrys, 1997]:
Digney [Digney, 1996, Digney, 1998] defines Nested Q-learning, where each Mind in a collection is able to call on any of the others. Each Mind server has its own set of actions Qi(x,a) and a set of actions Qi(x,k) where action k means "do whatever server k wants to do" (as in Hierarchical Q-learning). In a WWM implementation, each Nested server has a list of Mind URLs, either hard-coded or passed to it at startup. So the Nested server looks like a MindAS server co-ordinating many Mind servers to make its decision. But of course it is not making the final decision. It is merely suggesting an action to the master MindAS server that coordinates the competition between the Nested servers themselves. When the master MindAS server is started up with a list of Mind servers, it passes the list to each of the servers.
Some of the Nested servers might actually be outside the Action Selection competition, and simply wait to be called by a server that is in the competition. [Humphrys, 1997] calls these "passive" servers. We have the same with hand-coded MindM servers, where some Mind servers may have to wait to be called by others. A server may be "passive" in one Society and at the same time "active" (i.e. the server is in the Action Selection loop) in a different Society.
Watkins [Watkins, 1989] defines a Feudal (or "slave") Q-learner as one that accepts commands of the form "Take me to state c". In Watkins' system, the command is part of the current state. Using the notation (x,c),a -> (y,c) the slave will receive rewards for transitions of the form: (*,c),a -> (c,c) So the master server drives the slave server by explicitly altering the state for it. We do not have to change our definition of the server above. It is just that the server driving it is constructing the state x rather than simply passing it on from above.
The Nested and Feudal models are combined in [Humphrys, 1997, Fig. 18.4] showing the general form of a Society of Mind based on Reinforcement Learning. Indeed, the whole model of a complex, overlapping, competing, duplicated, sub-symbolic Society of Mind that we have developed in this paper is based on the generalised form of a Society of Mind based on Reinforcement Learning.
So far we have only defined a protocol for conflict resolution using numeric weights. Higher-bandwidth communication leads us into the field of Agents and its problems with defining agent communication languages (formerly symbolic AI knowledge-sharing protocols) that we discussed above.
We imagine that numeric weights will be more easily generated by sub-symbolic Minds, and are harder to generate in symbolic Minds. This is because symbolic Minds often know what they want to do but not "how much" they want to do it. Sub-symbolic Minds, who prefer certain actions precisely because numbers for that action have risen higher than numbers for other actions, may be able to say precisely "how much" they want to do something, and quantify how bad alternative actions would be.
It may be that in the symbolic domain we will make a lot more use of specialised MindM servers. This might be a popular alternative to having Minds generate Weights to resolve competition. The drawback, of course, is that the MindM server needs a lot of intelligence. It needs to understand the goals of all the Mind servers. This relates to the "homunculus" problem, or the need for an intelligent headquarters. Another possibility is the subsidiary Mind servers can be symbolic, while the master MindAS server is sub-symbolic - e.g. a Hierarchical Q-learner.
"Telerobotics" is the ability to control a robot remotely. Telerobotic systems have in fact been used in animats [Wilson and Neal, 2000], though not on the network. Outside of the animats field there are in fact a number of "Internet telerobotics" robots that can be controlled remotely over the Internet. [Taylor and Dalton, 1997] discuss some of the issues:
For example, [Stein, 1998] allows remote control of the robot until the client gives it up, or until a timeout has passed. [Paulos and Canny, 1996] operate in a special type of problem space where each action represents the completion of an entire goal, and so actions of different clients can be interleaved. In the robotic tele-garden [Goldberg et al., 1996] users could submit discrete requests at any time, which were executed later by the robot according to its own scheduling algorithm. The robotic Ouija board [Goldberg et al., 2000] is a special type of problem where the actions of multiple clients can be aggregated. It seems that all of these schemes could be implemented under the model discussed here. The focus so far in Internet telerobotics has been on remote human-driven control rather than remote program-driven control, but this may change.
This is not, however, an option in real-time virtual worlds, such as ones where other users or agents are changing the environment. Here the system may share some of the features of real-time multi-player online games (see survey in [Smed et al., 2001]). A large, nested Society of Mind may resemble a peer-to-peer game with low-bandwidth communication, which should scale well. A possible bottleneck is the top-level Mind server, depending how it is designed. [Abdelkhalek et al., 2001] considers performance issues with centralised game servers.
A top-level Mind server is unavoidable because the diversity of suggested actions must be reduced at some point, and a decision made. This point is the potential bottleneck. In many of the Action Selection schemes above, the top-level mind is reduced more or less to a router rather than a processor in its own right, in an effort to decentralise the intelligence. We now see that such an approach may also be useful in distributing the network load.
In the real physical world, a robotic animat also needs to make decisions quickly. It may be that a system such as this will be used for prototyping - experimenting with different Mind server combinations out of the choices online. Once a combination is chosen, one attempts to get a local installation of all the Mind servers involved. Why we are trying to avoid local installation is considered below. If we reject local installation, we cannot avoid network delays.
Part of the problem is, we argue, models of mind in which the loss of a single server would be a serious issue. Instead of models of mind where hundreds of similar servers compete to do the same job, researchers have been assuming the use of parsimonious minds where each component does a particular task that is not done by others. A better strategy is to keep adding "unnecessary" duplicated minds to your society. The master MindAS server asks all Mind servers to suggest actions, and times-out if it does not receive an answer in a short time. So in a highly-duplicated model, if the action does not arrive from one Mind server, it will have arrived from another similar one. In a mind with enough duplication, the temporary network failure (or even permanent deletion) of servers may never even be noticed. Obviously, some servers will be essential - like the World server, for instance. The basic answer for how to cope with essential servers is that if it is important to us, we will copy it (if it is free) or buy it or rent it.
[Humphrys, 1997] describes a multiple-minds model of AI that can survive brain damage by re-organising. The reader might have wondered what is the point of that. After all, if the AI is damaged, you just fix it or reinstall it surely? Here is the point - a model of AI that can survive broken links.
There is perhaps one other neglected area, which is sub-symbolic MAS online. Most work on MAS online is at the symbolic level (see agent communication languages, as discussed previously). One interesting issue is whether the Animats work on MAS, which involves what might be called sub-symbolic communication or signalling, can be brought online. Ongoing research by Walshe [Walshe, 2001] will attempt to interface sub-symbolic AS and MAS online.
How can one avoid these compatibility problems and allow researchers use whatever platform they want? By server-side programs rather than client-side programs. The Web demonstrates this other, and far more successful, model of re-use - leaving the program on the remote Web server, and running it from there. One strange aspect of adopting this model for the WWM is that the mind may consist of components which are physically at different remote sites, and which stay there, and just communicate with each other remotely. Hence the mind is literally decentralised across the world - something which has never existed in nature. Hence the name, the "World-Wide-Mind".
While we agree with this general scheme (run on HTTP, the data format should be tagged and extensible), we reject using the full complexity of the web services protocols. Why? Because of the unique nature of our audience. Most animats researchers are programmers, but not network programmers. These protocols - and indeed almost all protocols in computer networks, web services, distributed objects or Internet Agents - assume the programmer is a networks specialist (or is willing to become one). SOAP messages are complex, and you require an API to parse them. Doing it yourself is difficult.
We still agree with the idea of tagged plain text for our data. Plain text is important so humans can read the data, and programmers can parse it themselves. And with tagged plain text (where each piece of data is delimited by tags, whitespace is ignored) it is much easier to create a tolerant parser than with untagged plain text (where, say, precise column number or line number defines which piece of data is which). Tagging also allows extensible systems - we can ignore new tags that we don't recognise.
All Web servers support a system of server-side programs called CGI. CGI is not a difficult technology - indeed, there is almost nothing to it except placing a command-line program in the CGI directory of your Web server. Any programming language may be used. Programs read plaintext input (text, HTML, XML, or any XML-like format) on standard input and write plain text output to standard output. All browsers (and other clients) can run remote CGI programs.
This tolerance does not mean we cannot issue recommendations. The situation will be like the Web. The portal site w3.org defines the official HTML spec. (e.g. "tables should end with an end-table tag"). But the browser can't just choke on bad HTML, not if there is scope to make a guess and display it (e.g. if end of file comes with no end-table tag, then insert end-table tag). The browser must tolerate bad HTML, or users will switch to browsers that do. And the pool of authors would never have grown so big if authors had to write strict HTML. It is often forgotten that the Web does not run on strict HTML, and never could have.
Similarly, the portal site w2mind.org will define an official AIML spec. (e.g. "WWM query responses should end with an end-response tag"). But no matter how we define it, there will always be room for the client to make some guesses with bad AIML. Clients must try to tolerate AIML "close to" the spec. - though obviously there can be no guarantees once one deviates from the spec.
<request type="GetAction" runid="RUNID"> <data name="x"> x </data> </request>
where the format of x is decided by the World server. Clearly, this plaintext input can easily be parsed by any programmer using simple string searching mechanisms in any language (the first author's parser is just 5 lines of UNIX Shell, and is tolerant of many different variations in the input AIML). The Mind server then outputs to standard output something like:
<response type="GetAction" runid="RUNID"> <data name="a"> a </data> </response>
where the format of a is decided by the World server. We say "something like" because AIML is still in a state of revision, which is why the servers and clients have not yet been publicly released at the portal site w2mind.org. For the current spec., AIML v1.1, see [O'Leary, 2002].
That is all one needs - to agree on the format of AIML - and even full agreement is not necessary if one writes a tolerant parser. The program can be written in any language. Input and output can be debugged using an ordinary web browser (though for repeated queries one would want to use one of our dedicated clients).
We deal with all of these issues in [Humphrys, 2001, O'Leary, 2002a]. But it remains that a basic WWM server can still be got running using no more than a program that parses plaintext and outputs plaintext as above.
From the animats viewpoint the next things to do are: (a) Put existing well-known minds and worlds online as servers (we already have Tyrrell's world running [Tyrrell, 1993] but not yet as a server) and: (b) Construct network action selection mechanisms for Action Selection across multiple remote minds by different authors.
We have a new vision of a mind: no single author could write a high-level artificial mind, but perhaps the entire scientific community could. Each piece will be understood by someone, but the whole may be understood by no-one. Perhaps we need a new respect for the magnitude of the AI problem - that building a high-level artificial mind may be on the same scale as constructing something like a national economy, or the city of London. No single individual or company built London or New York. But humanity as a whole did.
The software for this system (servers, clients and server support software), and the design of AIML and associated protocols, is the joint work of the authors of this paper and Dave O'Connor and Ray Walshe. We are grateful to two anonymous referees for their comments on this paper.
Abdelkhalek, A.; Bilas, A. and Moshovos, A. (2001), Behavior and Performance of Interactive Multiplayer Game Servers, Proc. Int. IEEE Symposium on the Performance Analysis of Systems and Software (ISPASS-2001).
Aylett, R. (1995), Multi-Agent Planning: Modelling Execution Agents, 14th UK Planning and Scheduling SIG.
Berger, H.W. (1998), Is The NP Problem Solved? (Quantum and DNA Computers), www.pcs.cnu.edu/~hberger/Quantum_Computing.html
Brooks, R.A. (1986), A robust layered control system for a mobile robot, IEEE Journal of Robotics and Automation 2:14-23.
Brooks, R.A. (1991), Intelligence without Representation, Artificial Intelligence 47:139-160.
Brooks, R.A. (1997), From Earwigs to Humans, Robotics and Autonomous Systems, Vol. 20, Nos. 2-4, pp. 291-304.
Brooks, R.A. et al. (1998), The Cog Project, Computation for Metaphors, Analogy and Agents, Springer-Verlag.
Bryson, J. (2000), Cross-Paradigm Analysis of Autonomous Agent Architecture, JETAI 12(2):165-89.
Daniels, M. (1999), Integrating Simulation Technologies With Swarm, Workshop on Agent Simulation, Univ. Chicago, Oct 1999.
de Garis, H. (1996), CAM-BRAIN: The Evolutionary Engineering of a Billion Neuron Artificial Brain, Towards Evolvable Hardware, Springer.
Dennett, D.C. (1978), Why not the whole iguana?, Behavioral and Brain Sciences 1:103-104.
Digney, B.L. (1996), Emergent Hierarchical Control Structures, SAB-96.
Digney, B.L. (1998), Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments, SAB-98.
Ginsberg, M.L. (1991), Knowledge Interchange Format: The KIF of Death, AI Magazine, Vol.5, No.63, 1991.
Goldberg, K. et al. (1996), A Tele-Robotic Garden on the World Wide Web, SPIE Robotics and Machine Perception Newsletter, 5(1), March 1996.
Goldberg, K. et al. (2000), Collaborative Teleoperation via the Internet, IEEE Int. Conf. on Robotics and Automation (ICRA-00).
Guillot, A. and Meyer, J.-A. (2000), From SAB94 to SAB2000: What's New, Animat?, SAB-00.
Harvey, I.; Husbands, P. and Cliff, D. (1992), Issues in Evolutionary Robotics, SAB-92.
Humphrys, M. (1996), Action Selection methods using Reinforcement Learning, SAB-96.
Humphrys, M. (1997), Action Selection methods using Reinforcement Learning, PhD thesis, University of Cambridge, Computer Laboratory.
Humphrys, M. (2001), The World-Wide-Mind: Draft Proposal, Dublin City University, School of Computing, Technical Report CA-0301, February 2001.
Humphrys, M. (2001a), Distributing a Mind on the Internet: The World-Wide-Mind, ECAL-01, Springer-Verlag LNCS/LNAI 2159, September 2001.
Kaelbling, L.P.; Littman, M.L. and Moore, A.W. (1996), Reinforcement Learning: A Survey, JAIR 4:237-285.
Karlsson, J. (1997), Learning to Solve Multiple Goals, PhD thesis, University of Rochester, Department of Computer Science.
Lin, L-J (1993), Scaling up Reinforcement Learning for robot control, 10th Int. Conf. on Machine Learning.
Martin, F.J.; Plaza, E. and Rodriguez-Aguilar, J.A. (2000), An Infrastructure for Agent-Based Systems: an Interagent Approach, Int. Journal of Intelligent Systems 15(3):217-240.
McDermott, D. (1997), "How Intelligent is Deep Blue?", New York Times, May 14, 1997.
Minsky, M. (1986), The Society of Mind.
Minsky, M. (1991), Society of Mind: a response to four reviews, Artificial Intelligence 48:371-96.
Nilsson, N.J. (1995), Eye on the Prize, AI Magazine 16(2):9-17, Summer 1995.
O'Leary, C. (2002), AIML v1.1 - Artificial Intelligence Markup Language, www.comp.dit.ie/coleary/research/phd/wwm/aiml.htm
O'Leary, C. (2002a), Lightweight Web Services for AI Researchers, www.comp.dit.ie/coleary/research/phd/wwm/aiml-web-services.htm
Ono, N.; Fukumoto, K. and Ikeda, O. (1996), Collective Behavior by Modular Reinforcement-Learning Animats, SAB-96.
Paulos, E. and Canny, J. (1996), Delivering Real Reality to the World Wide Web via Telerobotics, IEEE Int. Conf. on Robotics and Automation (ICRA-96).
Sloman, A. and Logan, B. (1999), Building cognitively rich agents using the SIM_AGENT toolkit, Communications of the ACM, 43(2):71-7, March 1999.
Smed, J.; Kaukoranta, T. and Hakonen, H. (2001), Aspects of Networking in Multiplayer Computer Games, Proc. Int. Conf. on Application and Development of Computer Games in the 21st Century.
Stein, M.R. (1998), Painting on the World Wide Web, IEEE / RSJ Int. Conf. on Intelligent Robotic Systems (IROS-98).
Sutton, R.S. and Santamaria, J.C., A Standard Interface for Reinforcement Learning Software, www.cs.ualberta.ca/~sutton/RLinterface/RLinterface.html
Taylor, K. and Dalton, B. (1997), Issues in Internet Telerobotics, Int. Conf. on Field and Service Robotics (FSR-97).
Tyrrell, T. (1993), Computational Mechanisms for Action Selection, PhD thesis, University of Edinburgh.
Walshe, R. (2001), The Origin of the Speeches: language evolution through collaborative reinforcement learning, 3rd Int. Workshop on Intelligent Virtual Agents (IVA-2001).
Watkins, C.J.C.H. (1989), Learning from delayed rewards, PhD thesis, University of Cambridge.
Whitehead, S.; Karlsson, J. and Tenenberg, J. (1993), Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging, Robot Learning, Kluwer.
Wilson, M. and Neal, M. (2000), Telerobotic Sheepdogs: How useful is autonomous behavior?, SAB-00.
Wilson, S.W. (1990), The animat path to AI, SAB-90.