Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA249      CA318      CA425      CA651

w2mind.computing.dcu.ie      w2mind.org

Missing
DCU student

CASE3 student Paul Bunbury is missing since Thur 2 Feb 2012.
See appeals on crime.ie and garda.ie and facebook.

He is a great coder. See DCU page and boards.ie page.
He won major coding contests in 2010 and 2011.
He is author of the brilliant "FloodItWorld".
DCU can confirm that in Jan 2012 he passed all 6 modules comfortably.


Exercise


What is wrong with γ = 1?

i.e. Just sum all future rewards.
Consider this situation:


State-space is not geographic-space

Note state-space is not geographic-space.
Transitions could be one-way, not 2-way. No easy way back to the state you left forever.
e.g. In chess, state = your queen is captured.

or e.g. You are Hitler. It is 1941. You have conquered Europe and are unthreatened.
So you decide to invade the Soviet Union and declare war on the US (neither of whom were at war with you so far).
There is no way back ever to that state before invasion. You have gone through a one-way door.



r=0, γ = anything

If take action a, expected long-term reward:
Ea(R) = 5 + γ 0 + γ2 0 + γ3 0 + ..
    = 5

Eb(R) = 0 + γ 100 + γ2 0 + γ3 0 + ..
    = 100 γ

  1. If γ = 0, it picks a.

  2. If γ = 1, it takes action b.

  3. Exercise - What is the smallest γ for which it picks b?



r=1, γ = anything

Imagine if the infinite loops scored 1 each time round instead of 0.

Ea(R) = 5 + γ + γ2 + γ3 + ..

Eb(R) = 0 + γ 100 + γ2 + γ3 + ..

  1. If γ = 0, it still picks a.

  2. If γ = 1
    Ea(R) = 5 + 1 + 1 + ..
    Eb(R) = 0 + 100 + 1 + 1 + ...
    Both infinite, can't tell the difference, will pick either one. Not necessarily b.

    Notes on infinity:
    If γ = 1, then even if one infinite loop gives 1 forever, and the other gives 100 forever, it can't tell the difference. Both infinite. In fact, all the following are the same:

    1 + 1 + 1 + 1 + ...
    -1000 -1000 -1000 -1000 -1000 + 1 + 1 + 1 + ...
    10 + 10 + 10 + 10 + ...
    For any γ < 1 we do not have this problem.

  3. Exercise - If γ < 1, for what γ will it pick b?



r=anything, γ = anything

If the infinite loops give reward r each time round:

Ea(R) = 5 + γ r + γ2 r + γ3 r + ..
    = 5 + γ r + r (γ2 + γ3 + .. )

Eb(R) = 0 + γ 100 + γ2 r + γ3 r + ..
    = 100 γ + r (γ2 + γ3 + .. )



Feeds      HumphrysFamilyTree.com

Bookmark and Share           On Internet since 1987.