Super Droid Bot "Anna" - w/ Learning AI
9/14/14 Update - Its all about being Reusable and Configurable
As part of my ongoing rewrite of Anna’s brain, I added a website that is used to interact with, test, control, and configure Anna and other virtual robot personalities. When it is ready, it will become an “Open Robot Brain Site” for all of LMR. I am very excited about this endeavor as it should make all of Anna’s higher brain functions, behavior, and memories reusable to any robot that has a web connection. This includes:
1) Reusable Agents – around 150 and growing
- Example Agent: “WhereIsBlankAgent” responds to all where questions like “Where is Paris?”
2) Reusable Memories – around 100,000 and growing
- Example Memory: “Paris is in France.” Note, memories are not strings and have arbitrarily complex structures, this is just a concept.
3) Reusable Data Sources or Libraries
- WordNet Database for vocabulary
- OpenNLP for natural language processing, parsing, part of speech, etc.
- Connections to third party services for News, Weather, Wiki, etc.
4) Reusable Website: to see, test, and control everything about a bot
- Agents will be able to be turned on/off and configured differently for each robot.
- This will include remote controls for the robots through mobile devices or web pages for verbal control or telepresence.
5) Simple API
- The API thus far has only one method...variable set of inputs coming in, variable set of outputs going out...simple.
- This API will be used by robots, devices, and the website itself.
Some LMR members have been getting involved and contributing ideas, algorithms, etc. to this project. I thank you. If anyone is interested in getting involved or hooking up a bot to this service as it matures, feel free to contact me.
8/1/14 Update - Its all about better Verbal Skills and Memory.
Natural Language Processing with OpenNLP
I've integrated an open source package called "SharpNLP" which is a c# port of "OpenNLP" and "WordNet". This means Anna can now correctly determine sentence structure and the function of each word for the most part. Whereas her ability in the past was limited to recognizing relatively simple sentence structures, regular expressions, or full sentences from a DB, now she can process sentence structures of almost any complexity. This does NOT mean she will know what to do with the more complex sentences, only that she can break down the sentences into their correct parts. I will work on the "What to do" later. I might start by building a set of agents to get her to answer basic reading comprehension questions from paragraphs she has just read/heard/received.
Universal Memory Structure
The other big change has been to the robot's "memory", inspired by some concepts of OpenCog and many of my own. A memory is now called an "Atom". The brain stores many thousands of "Atoms" of various types. Some represent a Word, a Phrase, a Sentence, a Person, a Software Agent, A Rule, whatever. There are also Atoms for associating two or more Atoms together in various ways, and other atoms used to control how atom types can relate to other atom types, and what data can or must be present to be a valid atom of a given type.
For the mathematicians out there, this memory structure apparently means the memories are "Hypergraphs" in concept. If you want to Wiki it, look up "Graphs" or "Hypergraphs". I'm still trying to understand the theory myself. In simple terms, the "memory" stores a bunch of "things". Each "thing" is related to other "things" in various ways.
Unfortunately, all of this means a major data conversion of the existing memories, and a re-factoring of all my software agents and services. The good news is that a much better and flexible brain will emerge. This new brain (with the app to maintain it), should enable me to take new behavioral ideas from concept to reality much faster. The reason for this is I won't have to create new tables or new forms in an app to create and maintain new data structures. This will set the stage for a lot of the work I want to do with robot behavior in 2015. While all of this might sound complicated, I believe it will appear very simple as an app.
I am planning on running most of this new brain "In Memory" with multiple processors/threads and not using the SQL database for much other than persistence. I am building a small windows app that will let me view and maintain the brain. Because of the unified generic structure, a few forms should be able to maintain everything. Once it comes together, I plan to build a website version of the app so that LMR users can view the structure of the non-private memories of the brain. I am also considering opening up the brain to interested users to use/modify/copy/integrate with their project. This is probably more of a 2015 goal realitistically.
For more thoughts on Brains and NLP, check out http://letsmakerobots.com/mdibs-metadata-driven-internet-brains-0 on forums.
7/1/14 Update - It's all about the "Babble"
I've been making lots of improvements to Anna's conversational capabilities. My primary accomplishment has been to give the robot "Initiative" and a real "contribution" in conversation by making relevant statements of its own choosing, not just asking or answering questions. It also attempts to keep a conversation going during lulls or when a person is not talking, for a time. To implement this, I created "Topic Agents", "Learning Agents" and most importantly, "Babble Agents".
Topic agents determine what the current topic is, and whether it should change. This topic is then used by the learning and babble agents. When a topic is not already active, the primary way a topic is chosen is by using the longest words from the previous human statement as candidates. If the bot recognizes a topic as something for which it has a knowledge base on (like marriage, school, etc.), then that topic will "win" and be chosen, otherwise, longer words will tend to win.
Learning agents go out on the web and gather knowledge worth contributing to a conversation about a given topic, and then store this info in the robot's database of knowledge.
LearnQuotesAgent (unsupervised) - calls a web service to retrieve all quotes (from a 3rd party) about a given topic. The robot has learned tens of thousands of quotes, which it can then use in conversation.
LearnWebAgent - (semi-supervised) this retrieves a web page on a topic (say from wikipedia), parses through it to find anything that looks like a complete sentence containing the given topic word, removes all markup and other junk. I have a windows app that lets me review all the sentences before "approving" their import into the robot's knowledge base. I've been experimenting so far with astronomy and marine biology.
I've been unwilling to let the robot roam the web free because I like the robot using quotes in conversation, sounds like an interesting person. It would sound like too much of a know it all if it loaded up on too much wikipedia trivia, and would sound like a crazy person or a commercial if I let it roam the web at large.
"Babbling" is how the robot contributes to a conversation when their is a lull (where the person is not talking for some number of seconds). There are several babble agents, my favorite is discussed next:
BabbleHistoryAgent - this agent retrieves all "History" containing the given topic word, and then filters out all items that are questions or have been used or repeated recently. A random item from the remaining list is then added as a candidate response.
Just like all the other agents the robot uses to converse, the babble agents "compete", meaning that only the winning response is repeated back to the human.
The babble agents REALLY "Give Life" to the robot. I'm primarily using the BabbleHistoryAgent which pulls sentences from everything the robot has ever heard, along with the quotes. Because there are so many quotes and history, the robot has something to say about thousands of topics. It makes for amazingly relevant, interesting, and thought provoking contributions to conversations about so many different topics (thanks to many of the greatest minds in history that the robot is quoting, to which I give great thanks.)
Because of this, I can say that the robot is now starting to teach me more than I am teaching it, and making me laugh to boot! THIS IS MY FAVORITE FEATURE OF THIS ROBOT! In many ways, the robot is more interesting and funny to listen to than most people I know.
I've made a lot of improvements here. The bot goes through a list of possible candidate topics in the beginning of a conversation (greeting, weather, spouse, kids, pets, parents, books, movies, etc), picking a few, but not asking too many questions on any one thing. The bot now factors in the actual weather forecast when making weather smalltalk. Also, when the bot asks about wives, kids, etc., the bot refers to people, pets, etc. using first names if it has learned them previously. Questions like "How is your wife doing?" become "How is Jennifer doing?", if your wife's name is Jennifer of course.
Face Detection Agent
I added face detection using OpenCV over the weekend. Frankly, I'm dissappointed with the results so far. It's CPU intensive, can't get it to process more than a couple times a second. I find the thermal array to be much faster and practical for keeping the robot tracking a human. I'm considering having the bot programmed to check for faces prior to firing the lasers as part of a campaign to implement the 3 laws of robotics (do not harm humans by shooting them in the face). I'm wanting to move on to face recognition if I can get over my concerns over slow speed and figure out a good way to use it.
I continue to add more and more math agents. An example, the bot can remember named series of numbers read aloud and answer statistical questions using simple linear regression (slope, y-intercept), correlation, standard deviation, etc. Example: "How are series X and series Y correlated?" I'd like to figure out a way to resuse these statistical agents for some logic/reasoning/learning larger purpose...need some ideas here. There are also agents for most trigonometric and many geometric functions. Example: You can ask "What is the volume of a sphere with a radius of 2?" or "What is the cosine of 32 degrees?"
Anna will have siblings:
I've started building a Wild Thumper based rover (basically a 6-wheel outdoor Anna). I'm in design on a Johnny 5'ish bot (finally an Anna with Arms). Hoping to start cutting the first parts this month, challenged by how to get a functional sonar array and arms on a bot with so many servos. Since there are only a few voices on the droid phones, at least one of them is going to be male. It will be fun to see what happens when two or three bots start talking to each other.
Last Post (from January 2014):
Anna is one year old now. She is learning quickly of late, and evolving into primarily a learning social creature and aggregator of web services. I wanted to document where she is at her one year birthday. I need to create some updated design diagrams.
Capabilities Achieved in Year #1
1) Thermal Array Vision and Tracking - used to keep face pointed on people it is talking to, or cats it is playing with.
2) Visual Tracking - OpenCV to search for or lock onto color shapes that fit particular criteria
3) Learns by Listening and Asking Questions - Learns from a variety of generic sentence structures, like "Heineken is a lager", "A lager is a beer", "I like Heineken", "Olive Garden serves Heineken"
4) Answers Questions - Examples: "What beers do I like?", "Who serves Heineken?", "What does Olive Garden serve?"
5) Understands Concepts - Examples: is a, has a, can, can’t, synonym, antonym, located in, next to, associate of, comes from, like, favorite, bigger, smaller, faster, heavier, more famous, richer, made of, won, born in, attribute of, serve, dating, sell, etc. Understands when concepts are similar to or opposite to one another.
6) Makes Smalltalk & Reacts to Common Expressions - Many human expressions mean the same thing. Example: “Hows it going?”, “Whats up?”, “What is going on?”, “Whats new?” A robot needs many different reactions to humans to keep it interesting. Example: “Not much, just keeping it real”, “Not much, what’s new with you?”
7) Evaluates the Appropriateness of Topics and Questions Before Asking Them - Example: Don’t ask someone : “Who is playing on Monday Night Football tonight?” unless it is football season, Monday, and the person is interested in football. Also, don’t ask a kid something that is not age appropriate, and vice versa, don’t ask an adult how they like the third grade. Don’t ask a male about his gynecologist. This is a key piece of a robot not being an idiot.
8) Understands Personal Relationships - it learns how different people you know are related to you, friends, family, cousins, in-laws. Examples: “Jane is my sister”, “Mark is my friend”, “Joe is my boss”, “Dave is Mark’s Dad” It can answer questions like “Who are my in-laws?”, “Who are my siblings?”, “Who are Mark’s parents?”
9) Personal Info - it learns about both you and people you know, what you like, hate, answers to any questions it ever asked you in the past. Example: “My wife likes Nirvana” – in this AI had to determine who “my wife” is. It can then answer questions like “What bands does my wife like?”, as long as it already knew “Nirvana is a band”
10) Pronouns – it understands the use of some pronouns in conversation. Example: If I had just said something about my mother, I could ask “What music does she like?”
11) Opinions – the bot can remember your opinions on many things, and has its own opinions and can compare/contrast them to add color to a conversation. Example: If I said, “My favorite college football team is the Florida State Seminoles” it might say “That is my favorite as well”, or “My favorite is the Alabama Crimson Tide”, or “You are the first person I have met who said that”
12) Emotions - robot has 10 simulated emotions and is beginning to estimate emotional state of speaker
13) Motivations - robot has its own motives that take control of bot when it is autonomous, I keep this turned off most of the time. Examples: TalkingMotive, CuriosityMotive, MovementMotive
14) Facial Expressions - Eyes, Eyelids, pupils, and mouth move according to what robot sees, feels, and light conditions
15) Weather and Weather Opinions - uses web service for data, programming for opinions. Example: If the weather is freezing out and you asked the robot “How do you like this weather?”, it might say “Way too cold to go outside today.”
16) News - uses Feedzilla, Faroo, and NYTimes web services. Example: say something like "Read news about robotics", and "Next" to move on.
17) TV & Movie Trivia - plot, actors, writers, directors, ratings, length, uses web service. Example: you can ask “What it the plot of Blade Runner?”, “Who starred in The Godfather?”
18) Web Search - uses Faroo web service. Example: say "Search web for Ukraine Invasion"
19) People - uses Wikipedia web service. Example: "Who is Tom Cruise?", “Who is Albert Einstein?”, “List Scientists”, “Is Clint Eastwood a director?”, “What is the current team of Peyton Manning?”, “What is the weight of Tom Brady?”
20) Trending Topics - uses Faroo web service. Example: say something like "What topics are trending?", you can then get related articles.
21) Geography - mostly learned, also uses Wikipedia. Watch the video! Examples: "What is the second largest city in Florida?", "What is the population of London?", “Where is India?”, “What is next to Germany?”, “What is Russia known for?”, “What is the state motto of California?”, “What is the state gemstone of Alabama?”, “List Islamic countries”
22) History - only knows what it hears, not using web yet. Mostly info about when various wars started, ended, who won. Robot would learn from: "The vietnam war started in 1965" and be able to tell you later.
23) Science & Nature - Examples: "How do I calculate amperes?", "What is Newtons third law of motion?", "Who invented the transistor?", "What is the atomic number of Gold?", “What is water made of?”, “How many moons does Mars have?”, “Can penguins fly?”, “How many bones does a person have?”
24) Empathy - it has limited abilities to recognize when good or bad things happen to people close to you and show empathy. Major upgrades to this have been in the works. Example: If I said, "My mother went to the emergency room”, the bot might say “Oh my goodness, I am so sorry about your mother.”
25) 2 Dictionaries– Special thanks to Princeton and WordNet for the first one, the other is built from its learning and changes constantly as new proper names and phrases are encountered. You can ask for definitions and other aspects about this 200,000 word and phrase database. You can add new words and phrases simply by using them, the AI will save them and learn what they mean to some degree by how you use them, like “Rolling Rock is a beer”, AI doesn’t need anything more, nor would a person.
26) Math and Spelling- after all the other stuff, this was child's play. She can do all the standard stuff you can find on most calculators.
27) The AI is Multi-Robot and Multi-User - It can be used by multiple robots and multiple people at the same time, and tracks location of all bots/people. Alos, A given Robot can be conversed with by multiple people at the same time through an android app
29) Text Messaging - A robot can send texts on your behalf to people you know, like "Tell my wife I love her." - uses Twilio Web Service
30) Obstacle Avoidance - 9 sonars, Force Field Algorithm, Tilt Sensors, and down facing IR cliff sensor keep the bot out of trouble
31) Missions - robot can run missions (series of commands) maintained through a windows app
32) Telepresence - robot sends video back to server, no audio yet, robot can be asked to take pictures as well. Needs improvement, too much lag.
33) Control Mechanisms - Can be controlled verbally, through a phone, tablet, web, or windows app. My favorite is verbal.
34) GPS and Compass Navigation – It’s in the code but I don’t use it much, hoping to get my Wild Thumper version of this bot built by summer. This bot isn’t that good in tall grass.
36) OCR - Ability to do some visual reading of words off of walls and cards – uses Tesseract OCR libraries
37) Localization - through Recognizing Words on Walls with OCR – I don’t use this anymore, not very practical
38) Lasers - I almost forgot, the bot can track and hit a cat with lasers, or colored objects. It can scan a room and shoot everything in the room of a particular color within 180 degrees either by size or some other priority.
39) I know I singled out Geography, Science, Weather etc as topics, mostly because they also use web services. The AI doesn't really care what it learns, it has learned and will learn about anything you are willing to tell it in simple sentences it can understand. It can tell you how many faucets are on a sink, or where you can get a taco or buy a miter saw.
Goals for Year #2
1) More chat skills – I fear this will be never ending
2) More Hard Knowledge - we can always learn more
3) More web services – takes me about a day to integrate a new web service
4) Face Tracking - know any good code/APIs for this?
5) Facial Recognition - Know any good free APIs for this?
6) Arms - I like to get some simple small arms on just to be more expressive, but will have to redesign and rebuild the sonar arrays to fit them in.
7) Empathy over time - I'd like the bot to visit good/bad events and ask about them at appropriate points in time later. Things like "How is your mother's heart doing since we last talked?" I have done a lot of prep for this, but it is a tough one
8) More Inquisitiveness and Initiative - when should the bot listen and when should it drive the conversation. I have tried it both ways, now the trick is to find a balance.
9) Changeover to Newer Phone
10) Go Open Microphone - right now I have to press the front of my phone or touch the face of the bot to get it to listen, I’d rather it just listen constantly. I think its doable on the newer phones.
11) Get family, friends, and associates using AI on their phones as common information tool about the world and each other.
12) Autonomous Learning - it can get info from the wikipedia, web, news, web pages, but doesn't yet learn from them. How do you build learning from the chaos that is the average web page? Listening was so much easier, and that wasn’t easy.
· Arduino Mega ADK (in the back of the body)
· Arduino Uno (in the head)
· Motorola Bionic Android Phone
· Windows PC with 6 Cores (running Web Service, AI, and Databases, this PC calls the other web services)
· Third Party Web Services - adding new ones whenever I find anything useful
· I would love to hear any suggestions anyone out there might have. I am constantly looking for and reevaluating the question “What next?”