POMDP, an online demo: detailled description

My POMDP pages:

Detailled description

Principle

A robot is placed in a labyrinth. It must move towards the checked flag. The user has the ability to take the robot and drop it wherever he wants, so that he can see how the robot first gets lost, then finds its way again. The robot uses a partially observable markovian decision process.

What the robot knows

The robot knows the map of the labyrinth and it knows what action to take in each case of the labyrinth.

Further, it knows itself perfectly: it is aware that it sometimes drift and that its sensors are not always reliable, and it knows the exact probabilities that rules its movement and its sensors.

In addition to this a priori knowledge, the robots has got four wall-sensors: it can see whether it has a wall against it at its left, at its right, in front of it or behind it.

Lastly, the robot knows that the game is over when it reaches the checker, so if the game is not over, it concludes that it has not reached the checker: the probability to be at the checker is thus always 0 (except when the robot is actually there).

The robot is unreliable

In an empty labyrinth, the robot was dropped at the marked case and have moved ten times to the right. The most probable position (in yellow) corresponds to a nine-case move to the right. The probabilities of other positions are indicated in gray levels (the more likely, the darker).

When the robot moves (which it does each turn), there is a 70% probability of arriving in the expected case, but also a 10% probability of drifting of one cell to the left, another 10% probability of drifting to the right, and finally a 10% probability of skidding and not moving at all.

The robot never mistakes a direction for another (mistaking the top of the grid for its right, for instance).

This model is admittedly simple but if you consider it at a ten-move scale rather than at a single move scale, you get a more realistic probability distribution, as shown in the figure on the right.

Wall-sensors are usually reliable but they stop being so when the robot hits a wall. There is then a 45% probability not to see the wall it bumped, and the three other sensors have a 35% probability to fail (to detect a wall although there is not any, or the contrary).

The making of the movie

The animation uses Flash (Flash 5). The flash movie has been created by a C++ program thanks to Ming, a library that allows you to create Flash movies from PHP, C++, C, perl and python programs.

The source code is freely available under the terms of the GNU GPL.