Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

My big idea: Ancient Brain

Search:

CA114      CA170

CA668      CA669      Projects


Learning from Rewards (Reinforcement Learning, RL) as Pattern Classification

Object = state x.
e.g. x = state of robot and its environment.

Features describing the object = Values of each dimension defining the point x = (x1,..,xn).
e.g. xi = infra-red sensor reading to front of robot.

Feature space = State space. May be multi-dimensional, where each dimension takes continuous values. x is point in this space.

Classes to assign object into = Action a to take when state x is seen.
e.g. Move left, move right, stop.
Note classes normally small finite set. Actions often small, finite (discrete), but may be continuous, infinite set (e.g. real number output, move at angle).

Agent or actor (robot, program) learns to map x to a.




Generalisation

Does each x map to unique or multiple a's?

Can multiple x map to same a?

Is whole space covered? Does each x map to some a? Can we return an a for a new x, never seen before?

Does each action a map to some state x?




Noise / Probabilistic worlds / inaccurate / incomplete sensors

From the start we will allow our world to be probabilistic rather than necessarily deterministic.
i.e. In state x, you take action a. Sometimes this leads to state y. Sometimes it leads to state z.



Probabilistic reasoning / Stochastic control policy

Our action taker, instead of linking each x to a single a, may say instead something like:
"In state x, take action a with probability 0.9, action b with probability 0.1."


Consider:
  1. Deterministic control policy in a deterministic world.
  2. Deterministic control policy in a probabilistic world.

  3. Probabilistic control policy in a probabilistic world.
  4. Probabilistic control policy in a deterministic world.



Thought-controlled machines

Much work in thought-controlled machines involves separating a statespace into large regions, each of which can be associated with a discrete action.


Thought-controlled wheelchair.



Monkey controls a robotic arm with its brainwaves.




ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.

Wikipedia: Sometimes I link to Wikipedia. I have written something In defence of Wikipedia. It is often a useful starting point but you cannot trust it. Linking to it is like linking to a Google search. A starting point, not a destination. I automatically highlight in red all links to Wikipedia and Google search and other possibly-unreliable user-generated content.