Dr. Mark Humphrys School of Computing. Dublin City University. coders   JavaScript worlds Search:

```

```

# B.1 Bounds with a learning rate α

Let D be updated by: where d is bounded by , , and the initial value of . Then: Proof: The highest D can be is if it is always updated with : so . Similarly .  I should note this only works if α is between 0 and 1.

```
```

# B.2 Bounds of Q-values Proof: In the discrete case, Q is updated by: so by Theorem B.1: This can also be viewed in terms of temporal discounting: Similarly:  For example, if , then . And (assuming ) as , .

Note that since , it follows that .

# B.3 Bounds of W-values Proof: In the discrete case, W is updated by: so by Theorem B.1: by Theorem B.2.

Similarly:  Note that since , it follows that .

### Appendix C

Return to Contents page.

```
```
```
```
ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.

Note: Links on this site to user-generated content like Wikipedia are highlighted in red as possibly unreliable. My view is that such links are highly useful but flawed.