Let's Make Robots!

MBX-1 (Mammalian Brain Experiment #1)

Self-programming, machine learning
100_0135.jpg1.74 MB
100_0110.jpg2 MB

I have a theory that I've been working on for over 25 years now.  It's about making robots smarter by making them more aware of their surroundings.  To make them more aware, they need sensors and artificial emotions or feelings to interpret them.

MBX was a boiled down, bare-bones, first attempt at this. The whole program fits in one slot on the basic stamp2sx chip. (There's 2 in the pics, but only 1 was connected) 

When started, the robot knows nothing and moves randomly.  There's 6 motor outputs in can choose: (forward, backward, left forward, right forward, left backward, right backward).

It can detect obstacles with IR kit but, that information doesn't mean anything to it yet. (0 = all clear, 1 = obstacle on right, 2 = obstacle on left, 3= obstacle on both sides).  And it has bumper switches on all 4 corners, that are in parallel and use only 1 I/O line.

It will eventually hit something.  When it does, it will remember what it was doing (motor output) and what it was detecting (0,1,2, or 3).  It'll combine this data on a "list of things that caused crashes," or what I call Pain Memory.

It will then do another random motor output and hopefully move on.

While it's moving around, it constantly checks what it's doing and detecting against the Pain Memory to see if it encounted this situatioin before.  If so, it'll do another random motor output- before the collision takes place.

You can say, the robot  learns to avoid collisions by anticipating situations that caused it "pain" before.  And the anticipation of pain is a pretty good definition of "fear."


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.


I'm a graduate student in experimental psychology and I'm building a robot for the same reason.  However, my little guy is still in the "Zombie" phase of typing (I have to work on him between classes, work, and family).  But, everything I've been taught leads me to the conclusion that Processing Power + Many Sensor types + Incentive + Deterrents + Time = Complex behaviors.  Much like the RGB LED that can make thousands of unique colors through blending, so can three basic incentives/deterrents synergize to create an infinite set of complex behaviors; each, productive and with reason.  Good luck.


Thanks for the article (even though I don't understand it!). 

I'd read Heriserman's book when I was a teenager and my idea is based on that.  I want to eventually add positive and negative reinforcements (pleasure & pain buttons) to its actions (from the combination of its inputs and previous outputs).  And to remember what output(s) it did when certain inputs are present so it will avoid the painfull situations and seek out pleasurable ones.

Teaching it to seek a light source was my next experiment, but I couldn't do it with a stamp.  I have a Propeller now, but got to learn how to program it.

I've done some work on a variable structure stochastic learning automaton and written a basic interactive program, if you want to take a look: http://letsmakerobots.com/node/34383

For my approach every possible action is tagged with a probability. The probability values are updated by a reinforcement scheme depending if the action was favorable or not. A choice algorithm choses then the next action according to the probability values. If for instance an action has a probability value of 1/3, the chance is 1:3 that it will be chosen by the choice algorithm.

Anyhow, great work, Jim. Will follow it with great interests.


Hi Hermit,

Your attempt is called a “pure chance automaton.” It is not considered as a learning automaton because a learning automaton must at least do better than a pure-chance automaton (see Variable structure stochastic learning automaton). It is exactly what is described in David L. Heiserman's book How to Build Your Own Self-Programming Robot.


This robot will do much better than a pure-chance automaton. Once it has gone forward and bumped into a wall while detecting something in front the next time it will detect something in front of it it will not go forward again. How is that pure-chance?

The main issues I see with this kind of learning is that it doesn't scale. Let's upgrade the robot to have more sensors - let's give him a camera instead of just 4 sensors. (You can imagine mounting the camera to look down on the robot and its surrounding environment.) Each pixel can be considered a sensor. For your learning algorithm each combination of sensor values is a new situation and for each situation you must try out a number of actions. The total number of possible situations is incredibly large even for a low resolution black and white camera. It would take a very very long time to discover all the appropriate responses for each situation and a HUGE ammount of space to store the required information.

It is a pure-chance automaton because the probabilities for every possible action are always the same (without a reinforcement scheme)

The input from the environment is binary, either favorable or unfavorable. Bumper sensors are not the best choice, but a ping will do. A certain distance to an obstacle is either favorable or unfavorable. That's all.

The probabilities for an action (in a given situation) change with each impact, like this:

There are 6 possible actions. The probability of choosing each one is 16.6%.

The robot "detects" a wall, moves forward and bumps into the wall. It adds this to his list of "bad ideas".

The next time the robot detects a wall ahead, it looks up the situation in the list of bad ideas and removes any bad moves from the list of possible moves. So, in our case, there are now 5 possible actions. Effectively the probability for the forward action would be 0% and the probability for any other action would be 20%.

Hi Antonio,

Nearly. The probabilities for every actions are updated by a so called 'reinforcement scheme'. If one action leads to an unfavorable situation, the probability will be lowered, but not set immediatelly to 0. Same applies if the action was favorable. How fast it converges can be adjusted by the learning parameter a (if too fast, the robot might interpret the environment wrong). However, a probability can only reach 0 or 1 in infinite steps (theoretical, the microcontroller has some restrictions). In other words, you get a stochastic n-valent (n are the numbers of possible actions) bijection of the environment the robot operates.