Let's Make Robots!

Demo of a Conversational Robot that Learns by Listening...

I have been working on this for the past few months, basically, its a conversational learning AI.  I've tried to figure out how to explain it...best to watch the video.  The logic based stuff is a few minutes into the video.

At the core of it, the bot learns concepts by listening to people and remembering what they say...

The robot learns from humans saying things like "People are mammals...mammals are animals...mammals have two eyes...fish can swim...London is in England...France is next to Germany...Rocks are heavier than feathers...Steel is stronger than Iron.  Cheetahs are faster than humans.  Superman is faster than a bullet...Penguins can't fly, Beijing is the capital of China...The Battle of Midway was in 1942...on and on.  From this it can deduce and answer a lot of logic based questions by traversing relationships it has learned.  It understands concepts like "is a", "has a", "location", "faster", "heavier", "smarter", "more famous", "near", "expensive" and much more.  It can answer who, what, where, when, why, and how many type questions.

The robot has hundreds of questions organized into topics that it can talk about.  It evaluates each question beforehand as to whether it is appropriate for whom it is talking to.  It remembers everything and revisits different questions on different timeframes like..."How is your evening going?" might come up often, while "Do you have children?" might only happen once every 5-10 years.  Some are time based...like "Who is playing on Monday Night Football tonight?" only happens if you like football, it is fall, and it is monday.  "Are you retired?" would only come up it you are older.

Thus far, the bot has learned 200,000 words, 4000 pieces of learned knowledge, and about 1200 commands and questions.

The robot has its own opinions on some things (like football) and has emotional reactions based on similarities/differences between its own opinions and people it is talking to.   It has some ability to empathize now by recognizing bad events that are happening to people that are closely related to the person talking to the robot.

The robot keeps separate records about each person it talks to.  It can answer questions about itself (1st person), you (2nd Person), or other people (3rd Person) known to you and the bot when referred to by name.

The bot has thermal vision which it uses to keep its head tracked on people it is talking to.   Its off in this video...hope to do a demo on that soon.

This video scratches the surface of what it can do...hope some of you folks like it!  A few glitches in the vid...since fixed!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Martin, now that I've read this article I see a lot of the genius behind your sentence parsing.

BTW: if you want to learn Linux, I probably have an extra BBB around here I could send you.

 

 

This is good. I mean really good.

Are you using Googles speech recognition engine with an AliceBot layered on top, or ???

I'll admit, I'm stumped on the architecture you would use to pull off this masterpiece.

Way better than any of the Turing test videos I have seen. If I had to pin point the difference that pushes this learning AI implementation over the top, it would be the "just right" inclusion of empathetic responses and the blazing fast response times.

 

Thanks so much for the positive feedback.  I'll see if I can address the architecture at a high level.

If a user touches the face (or other events happen I won't go into), the app on the phone will call the google speech recognition engine.  When the speech comes back, it will get handed over to the part of the app on the robot that maintains a datalink with the server.

The datalink is implemented by the phone calling a web service a few times a second that is running on a Windows PC.  The app on the PC is running a web service C# App and a SQL Server database.  I could have built the AI right into the android phone in java and sqlite, but I had some reasons to do it this way.

The database has the following tables:

robots (1 row for each bot) - this system is set up to serve multiple robots running or one or more robots at the same time.  in addition, a single robot can be running on multiple devices at the same time.

words (1 row for each word or phrase) - approx 6000 active words

An extended dictionary - approx 200,000 words with definitions - from WordNet from Princeton University

Entities - people or things that a bot has learned some info about

Attibutes - a generic set of attributes that entities can use

EntityData - known information about entities, like a persons last name, birthday, whether they like a particular movie, whatever.  This is where answers to specific questions that the robot has asked people are generally stored.

Sentences - 1 row for each normalized sentence - A sentence could be a question, answer, topic, a lot of things.  The app can convert questions to a normalized for in second person singular and then see if there is a match with a specific sentence, it can then find the associated attribute and look up to find if it has a known answer in EntityData.

Commands - 1 row for each unique command (with data) that a bot understands.  For me, a command is made up of a ServiceID, a CommandID, 4 Integer Data Values, and 1 String Data Value.  The ServiceID is used to route the command to the proper service once it gets back to the bot.  Example:  A "Look North" command would route to the "ServoService" with a "LookOnHeading" Command and a DataValue1 of 0 representing north.  The ServiceID and CommandID are integers but I hope you get the idea.  By using a standard message structure, these commands can be passed around between the phone and one or arduinos using USB, or serial connections, etc. in a standard way.

Conditions - these are conditions that must be met for a sentence (a question generally) to be appropriate for asking.  Example:  Age > 30.

WordAssoc - this is where most of the general learned info is stored...things like "A cat has nine lives".  It stores a 3-way association between the word "cat", the word "life", the associated type "has".  "nine" rides along as extra data that is stored.

History - a history of all incoming (and soon outgoing) communication, to be used later for features to be determined.

There are several other tables, but this is the bulk of the important stuff.

The Web Service app basically does the following:

1.  Receives a request

2.  Looks up the bot associated with the request from a cache...stored telemetry...GPS, sensor data, etc., if present in the request.

3.  If there is any incoming text, normalizes the incoming text, looking for multi-word phrases, converting plurals to singulars, identifying entities referenced by name or pronouns, and removing words that don't add meaning (a, an, the, please, etc)

4.  HERE IS THE PROCESSING STEP...the app loops through a collection of "Responder" classes in a set order.  Each Responder implements a common interface.  Each responder gets a chance to evaluate whether the normalized request is applicable to it.  If it is, it can then add a response with a priority, a command (with data) to send back to the robot, and a change in emotional state.  These responders do a lot of database actions to crawl word associations to answer particular types of questions.  There are probably around 40-50 of these.  Since my video I have written "Responders" to call other web services for news, weather, wikipedia, movie info, etc.  The AI is no longer limited to info it has learned, as it was in the video.  Trivia doesn't really make it smart, but it helps.

5.  The server maintains a cached state of each ongoing conversation in case a question is answered by a user that is part of a topic and the conversation needs to proceed.  A topic and set of pronouns are also stored.  example- if a user refers to a man and then uses the word "his" a minute later...the robot knows "his" is referring to the man recently referred to.

6.  Generally, the highest priority response (with a command and data if present) is sent back to the given robot.  The robot speaks the response if there is one, alters its emotional state, and the display of its emotional state, and executes any commands it has received a long with the response from the server.

A quick side note...one thing I really like about this architecture is i can talk to one device, using it as a voice remote control, it can call the server, process it, leave a pending action for the robot to do later.  later is usually a fifth of a second as i set the datalink on the bot to check in with the server 5 times a second.  It also means that the robot AI can be omni-present on any tablet or phone in my house and/or friends and relatives.  There is also a way to initiate a text chat with the AI, it will know who its talking to by the phone number.

My next goals are:

1.  To get the AI to be able to initiate and carry on a conversation with a person about their family members, relatives, etc. and remember everything in a useful way, especially good or bad things happening with each person referenced, and show appropriate empathetic and emotional responses.

2.  Sometime later when talking to one of the affected people, to get a bot to be able to initiate conversation about the good or adverse events.   Something like "How is your mother doing after the operation?"

Just realized I am writing a book..hope someone reads this...ever.  Thanks again for the feedback.

Regards,

Martin

Very cool!  I am impressed.  Thanks for the documentation on the architecture.  I collected this since I missed this when it was originally posted.  That is cool what you have done with the responders etc. 

This seems like genius to me.  To be able to support multiple robots, the responders which allows you to easily add or extend existing functions like that is very clever. 

I am awed by results.

Regards,

Bill

 

Thanks Bill!  I've started a wild thumper based bot I'm planning on plugging into all this, hopefully by summer.

Appreciate the feedback.

Regards,

Martin

Wow!

Database management and manipulating big data are a few grades above my scale. I'm fairly content in the engine room of motor control, sensor integration and robust "spinal column reflex" type event handling. Your work in this area of frontal lobe bot architecture won't be getting any academic design challenges from me.

That said, your architecture seems extensible enough to put in the cloud and offer as a subscription service to at least have the bandwidth paid for. For instance, my Roomba does not run very often, but it would be nice if it was connected and smart when it was active.

This scenario of "smart by subscription" is my parlor game guess of one of the things that will come out of Google buying up such a large chunk of the robot community over the past couple of years.

The only thing I might suggest is having a look at the ROS nomenclature for robot pose, and velocity commands. From what I can think of, the AI layer won't collide at all with the ROS distributed control structure. It would be good to have a common frame of reference if any data were passed between the two systems. Since ROS is now a community rather than commercially supported platform, I expect a lull in breakthrough type developments as the community regroups after such large organizational changes. The existing ROS code base is already very impressive at it stands.

Would a remote server AI system need to know or care about the particulars of my mechanical and control system? Absolutely not! This is the main advantage of the path planning and obstacle avoidance functions offered by the ROS architecture as well.

Looking forward to the next installment of SuperDroidBot development!

Best Regards

 

edit: This emotive head hits the right notes for me... it is cartoon enough to not be creepy and is super expressive. If you ever make your emotive AI system available and a block of time opens up for a fun project, I would enjoy bringing this system up.

http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/strange-polish-robot-

 

 

Before I clicked on the video to start it, I thought the head looked creepy, but now that I've seen how expressive it is, wow!

I may have to see if I can give it a try someday. Well. Something like it. It would really go great with Anna if she had a responder for humor,

Database stuff is not really a big deal in my opinion.  I looked up ROS again, now that REALLY is a few grades above my scale at this point.  I didn't come from a robotoics background.  I spent a career as a software architect building reusable business software frameworks from scratch for furtune 500.  The good part was I got good at solving problems on my own for teams of people.  The downside is I never got very good at using other people's frameworks...as almost none existed.  Times have changed.  I suppose I was a small part of that change, as I spent so many hours evangelizing the merits of reusable frameworks in silicon valley, when very few people were listening.

I do think the future of higher level brain functions is in the cloud.  I'd be tempted to start opening up the work I am doing to the hobby community if I could figure out a way to do it and still protect privacy and a few other issues.  I don't show it in the video, but this AI can get very very personal about the information it learns.  It ultimately will make the privacy concerns that people have about facebook look like a joke in my opinion.

Anytime you want to talk emotive heads, just fire off a message, new blog or something.  I am definitely game to share ideas on that.

Cheers,

Martin

About privacy concerns, I automatically assume every connected device has a hot microphone these days, even if it not expressly advertised like Google Now, Google Glass and the PS4.

The use of Google's voice recognition engine kind of makes privacy a moot point anyway: there is none!

"It really is changing the way that people behave." ...When you talk to Android's voice recognition software, the spectrogram of what you've said is chopped up and sent to eight different computers housed in Google's vast worldwide army of servers. It's then processed, using the neural network models built by Vanhoucke and his team.’

http://www.wired.com/wiredenterprise/2013/02/android-neural-network/

I view it as the price of admission for getting to play with the fun stuff. Like you mentioned earlier, this is only scratching the surface of some yet-unthought-of helpful and entertaining applications.