MIT's ComText lets robots follow voice commands

The main contribution is the idea that robots should have different kinds of memory just like people

Press trust of India | Boston Last Updated at September 5, 2017 00:30 IST

This work is a nice step towards building robots that can interact much more naturally with people

Scientists from the Massachusetts Institute of Technology (MIT), including those of Indian origin, have developed a new system that allows robots to understand voice commands just like artificial intelligence (AI) assistants such as Siri and Alexa.

Currently, robots are very limited in what they can do.

Their inability to understand the nuances of human language makes them mostly useless for more complicated requests.

For example, if you put a specific tool in a toolbox and ask a robot to “pick it up,” it would be completely lost.

Picking it up means being able to see and identify objects, understand commands, recognise that the “it” in question is the tool you put down, go back in time to remember the moment when you put down the tool, and distinguish the tool you put down from other ones of similar shapes and sizes.

Researchers from MIT have gotten closer to making this type of request easier.

They have developed an Alexa-like system called “ComText” — for “commands in context” — that allows robots to understand a wide range of commands that require contextual knowledge about objects and their environments.

“Where humans understand the world as a collection of objects and people and abstract concepts, machines view it as pixels, point-clouds, and 3D maps generated from sensors,” said Rohan Paul, one of the lead authors of the paper. “This semantic gap means that, for robots to understand what we want them to do, they need a much richer representation of what we do and say.”

The team tested ComText on a two-armed humanoid robot Baxter. ComText can observe a range of visuals and natural language to learn about an object’s size, shape, position, type and even if it belongs to somebody. From this knowledge base, it can then reason, infer meaning and respond to commands.

“The main contribution is this idea that robots should have different kinds of memory, just like people,” said Andrei Barbu, the project’s co-lead.

With ComText, Baxter was successful in executing the right command about 90 per cent of the time.

In the future, the team hopes to enable robots to understand more complicated information, such as multi-step commands, the intent of actions, and using properties about objects to interact with them more naturally.

By creating much less constrained interactions, this line of research could enable better communications for a range of robotic systems, from self-driving cars to household helpers.

“This work is a nice step towards building robots that can interact much more naturally with people,” said Luke Zettlemoyer, an associate professor at the University of Washington in the US, who was not involved in the research.

“In particular, it will help robots better understand the names that are used to identify objects in the world, and interpret instructions that use those names to better do what users ask,” Zettlemoyer said.

Today’s Robots

Currently, robots are very limited in what they can do
For example, if you put a specific tool in a toolbox and ask a robot to “pick it up,” it would be completely lost
MIT researchers developed a system called “ComText” that allows robots to understand a wide range of commands
The team tested ComText on a two-armed humanoid robot called Baxter
ComText can observe a range of visuals and natural language to learn about an object’s size, shape, position, type and even if it belongs to somebody
From this knowledge base, it can then reason, infer meaning and respond to commands

First Published: Tue, September 05 2017. 00:30 IST