School of Computing. Dublin City University. Home Blog Teaching Research Contact Online coding site: Ancient Brain Free course: Online AI programming exercises 


In general, Reinforcement Learning work has concentrated on problems with a single goal. As the complexity of problems scales up, both the size of the statespace and the complexity of the reward function increase. We will clearly be interested in methods of breaking problems up into subproblems which can work with smaller statespaces and simpler reward functions, and then having some method of combining the subproblems to solve the main task.
Most of the work in RL either designs the decomposition by hand [Moore, 1990], or deals with problems where the subtasks have termination conditions and combine sequentially to solve the main problem [Singh, 1992, Tham and Prager, 1994].
The Action Selection problem essentially concerns subtasks acting in parallel, and interrupting each other rather than running to completion. Typically, each subtask can only ever be partially satisfied [Maes, 1989].
Lin has devised a form of multimodule RL suitable for such problems, and this will be the second method tested below.
Lin [Lin, 1993] suggests breaking up a complex problem into subproblems, having a collection of Qlearning agents learn the subproblems, and then have a single controlling Qlearning agent which learns Q(x,i), where i is which agent to choose in state x. This is clearly an easier function to learn than Q(x,a), since the subagents have already learnt sensible actions. When the creature observes state x, each agent suggests an action . The switch chooses a winner k and executes .
Lin concentrates on problems where subtasks combine to solve a global task, but one may equally apply the architecture to problems where the subagents simply compete and interfere with each other, that is, to classic action selection problems.
Return to Contents page.