School of Computing. Dublin City University. Home Blog Teaching Research Contact My big idea: Ancient Brain 


How do we actually code the statespace? In particular how do we encode the Q(x,a) data structure? We want a data structure that can take as input (i.e. be indexed by) two vectors x and a, both members of finite sets, and return a real number output Q(x,a).
The basic idea is that because x and a are members of finite sets, we can give each of them a unique ID number, and so we can give the (x,a) combination a unique ID too. Then we just have an array of some size N of real numbers, indexed by this ID.
x = (x_{1}, x_{2}, .. x_{n})
where each x_{i} takes one of some finite set of values
v_{0}, v_{1}, ... v_{m}
(in fact, the set of values, and the number of them,
could be different for each dimension i).
e.g. x = (x_{1}, .. x_{10})
where for example x_{7} takes values 100, 0, or 100.
To help us enumerate the states so we can construct an ID number, we simply constrain the set of values to run from 0 to m (we can translate to/from the real values when we use them). So above we say x_{7} takes values 0,1,2.
Even with large finite, we may need generalisation.
To reduce the statespace size, we may need to be careful with our definition of state. Note that we can easily define too many possibilities, so that a huge chunk of the logical state space may not actually exist, e.g. all blank in chess board, or all kings. Could even have 90 percent of the lookup table unused.
Often we might make the world discrete and finite by dividing up the continuous input into a small number of coarsegrained divisions. But remember this in itself is a generalisation.Consider how we could enumerate the finite set of states defined as follows:
x = (x_{1}, x_{2}, x_{3})
where x_{1} takes values 0,1,
x_{2} takes values 0,1,2,
and x_{3} takes values 0,1.
We can sum this up by just having a set of variables
saying how many values each dimension takes:
c_{1}=2, c_{2}=3, c_{3}=2.
We can enumerate as follows:
x = (0,0,0) id = 0 0 0 1 1 0 1 0 2 0 1 1 3 0 2 0 4 0 2 1 5 1 0 0 6 1 0 1 7 1 1 0 8 1 1 1 9 1 2 0 10 1 2 1 11Note that it is not binary.
Consider the last state (1,2,1).
The "1" means we have done the "0"'s in the first column,
i.e. we have done 1.c_{2}.c_{3},
plus the "2" means that within the "1"s
we have already done an additional 2.c_{3},
plus the last "1" means we have done an additional 1.
So id(1,2,1) = 1.c_{2}.c_{3}
+ 2.c_{3} + 1
In general, we can enumerate a state as follows:
int id ( state x )
{
return x_{1}c_{2}c_{3} + x_{2}c_{3} + x_{3}
}
Where
x = (x_{1}, x_{2}, .. x_{n}),
id(x) = x_{1}c_{2}..c_{n}
+ x_{2}c_{3}..c_{n}
+ ...
+ x_{(n1)}c_{n}
+ x_{n}
The number of possible states is:
nostates = c_{1}c_{2}...c_{n}
The state IDs run from: 0 .. (nostates1)
Similar scheme. Action IDs run from: 0 .. (noactions1)
(x,a) pairs can be enumerated:
(0,0) id = 0 + 0 (0,1) 0 + 1 ... ... (0,noactions1) 0 + noactions1 (1,0) 1.noactions + 0 ... ... (1,noactions1) 1.noactions + noactions1 (2,0) 2.noactions + 0 ... ... (nostates1,noactions1)The ID of an (x,a) pair is:
The number of (x,a) combinations is:
nostates*noactions
IDs run from:
0 .. (nostates*noactions)1
So our function to access the State Space is:
real StateSpace :: at ( vector x, vector a )
{
int id = (id(x) * noactions) + id(a)
return ActualArray [ id ]
}
where ActualArray is simply an ordinary 1dimensional array of real numbers:
real ActualArray [ nostates * noactions ]Then we have one of these data structures to hold the Qvalues:
StateSpace Q
and we can just use it like:
Q.at(x,a) = ((1ALPHA) * Q.at(x,a)) + (ALPHA * newestimate)
In the sample code you will see that I enumerated Actions x States rather than States x Actions as above.