Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA249      CA318      CA425      CA651

w2mind.computing.dcu.ie      w2mind.org

Missing
DCU student

CASE3 student Paul Bunbury is missing since Thur 2 Feb 2012.
See appeals on crime.ie and garda.ie and facebook.

He is a great coder. See DCU page and boards.ie page.
He won major coding contests in 2010 and 2011.
He is author of the brilliant "FloodItWorld".
DCU can confirm that in Jan 2012 he passed all 6 modules comfortably.

Q-learning with a Neural Network

Revision - Normal supervised learning.


Q-learning with a Neural Network:
Input x,a.
Output yk = Q(x,a).


We are not learning from correct exemplars, as in normal supervised learning. That would be like being given the "correct" output:
Ok = Q*(x,a).

Instead we are learning from estimates. The output we are "moving towards" is:

so for example in discrete case we do:

that is:

In the neural network Q-learning, we backpropagate the error:




But of course the term:

is just an estimate, and Q itself is changing as we go along.

The "timeless" information is that x,a led to y,r. We can save these 4 values and "replay" the experience many times, with improved values of Q.

Read my PhD, Section 4.3.2.


There are lots of interesting issues. For example, replay:
(x,a) -> (y,r)
a million times and it learns that all (x,a) lead to (y,r). We need to mix up our replays. Remember our discussion of over-learning and forgetting.

Also random learning, which worked with lookup tables, won't work with neural nets, because the exemplars interfere with each other. The net will just learn that all actions lead to nothing. We will need a more intelligent control policy, something like a Boltzmann distribution.


Feeds      HumphrysFamilyTree.com

Bookmark and Share           On Internet since 1987.