Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Online coding site: Ancient Brain

coders   JavaScript worlds


CA170      CA668      CA686

Online AI coding exercises

Project ideas

Research - PhD - Chapter 2 - Chapter 3

3 Multi-Module Reinforcement Learning

In general, Reinforcement Learning work has concentrated on problems with a single goal. As the complexity of problems scales up, both the size of the statespace and the complexity of the reward function increase. We will clearly be interested in methods of breaking problems up into subproblems which can work with smaller statespaces and simpler reward functions, and then having some method of combining the subproblems to solve the main task.

Most of the work in RL either designs the decomposition by hand [Moore, 1990], or deals with problems where the sub-tasks have termination conditions and combine sequentially to solve the main problem [Singh, 1992, Tham and Prager, 1994].

The Action Selection problem essentially concerns subtasks acting in parallel, and interrupting each other rather than running to completion. Typically, each subtask can only ever be partially satisfied [Maes, 1989].

3.1 Hierarchical Q-learning

Lin has devised a form of multi-module RL suitable for such problems, and this will be the second method tested below.

Lin [Lin, 1993] suggests breaking up a complex problem into sub-problems, having a collection of Q-learning agents tex2html_wrap_inline6847 learn the sub-problems, and then have a single controlling Q-learning agent which learns Q(x,i), where i is which agent to choose in state x. This is clearly an easier function to learn than Q(x,a), since the sub-agents have already learnt sensible actions. When the creature observes state x, each agent tex2html_wrap_inline6859 suggests an action tex2html_wrap_inline6861 . The switch chooses a winner k and executes tex2html_wrap_inline6865 .

Lin concentrates on problems where subtasks combine to solve a global task, but one may equally apply the architecture to problems where the sub-agents simply compete and interfere with each other, that is, to classic action selection problems.

Chapter 4

Return to Contents page.

ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.

Wikipedia: Sometimes I link to Wikipedia. I have written something In defence of Wikipedia. It is often a useful starting point but you cannot trust it. Linking to it is like linking to a Google search. A starting point, not a destination. I automatically highlight in red all links to Wikipedia and Google search and other possibly-unreliable user-generated content.