Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

My big idea: Ancient Brain

Search:

CA114      CA170

CA668      CA669      Projects


Exercise - long-term reward


What is wrong with γ = 1?

i.e. Just sum all future rewards.
Consider this situation:


State-space is not geographic-space

Note state-space is not geographic-space.
Transitions could be one-way, not 2-way. No easy way back to the state you left forever.
e.g. In chess, state = your queen is captured.

or e.g. You are Hitler. It is 1941. You have conquered Europe and are unthreatened.
So you decide to invade the Soviet Union and declare war on the US (neither of whom were at war with you so far).
There is no way back ever to that state before invasion. You have gone through a one-way door.



r=0, γ = anything

If take action a, expected long-term reward:
Ea(R) = 5 + γ 0 + γ2 0 + γ3 0 + ..
    = 5

Eb(R) = 0 + γ 100 + γ2 0 + γ3 0 + ..
    = 100 γ

  1. If γ = 0, it picks a.

  2. If γ = 1, it takes action b.

  3. Exercise - What is the smallest γ for which it picks b?



r=1, γ = anything

Imagine if the infinite loops scored 1 each time round instead of 0.

Ea(R) = 5 + γ + γ2 + γ3 + ..

Eb(R) = 0 + γ 100 + γ2 + γ3 + ..

  1. If γ = 0, it still picks a.

  2. If γ = 1
    Ea(R) = 5 + 1 + 1 + ..
    Eb(R) = 0 + 100 + 1 + 1 + ...
    Both infinite, can't tell the difference, will pick either one. Not necessarily b.

    Notes on infinity:
    If γ = 1, then even if one infinite loop gives 1 forever, and the other gives 100 forever, it can't tell the difference. Both infinite. In fact, all the following are the same:

    1 + 1 + 1 + 1 + ...
    -1000 -1000 -1000 -1000 -1000 + 1 + 1 + 1 + ...
    10 + 10 + 10 + 10 + ...
    For any γ < 1 we do not have this problem.

  3. Exercise - If γ < 1, for what γ will it pick b?



r=anything, γ = anything

If the infinite loops give reward r each time round:

Ea(R) = 5 + γ r + γ2 r + γ3 r + ..
    = 5 + γ r + r (γ2 + γ3 + .. )

Eb(R) = 0 + γ 100 + γ2 r + γ3 r + ..
    = 100 γ + r (γ2 + γ3 + .. )



ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.

Wikipedia: Sometimes I link to Wikipedia. I have written something In defence of Wikipedia. It is often a useful starting point but you cannot trust it. Linking to it is like linking to a Google search. A starting point, not a destination. I automatically highlight in red all links to Wikipedia and Google search and other possibly-unreliable user-generated content.