MIT's ComText lets robots follow voice commands

The main contribution is the idea that robots should have different kinds of memory just like people

Press trust of India  |  Boston 

robots
This work is a nice step towards building robots that can interact much more naturally with people

Scientists from the Institute of (MIT), including those of Indian origin, have developed a new system that allows to understand voice commands just like (AI) assistants such as Siri and

Currently, are very limited in what they can do.

Their inability to understand the nuances of human makes them mostly useless for more complicated requests.

For example, if you put a specific tool in a toolbox and ask a robot to “pick it up,” it would be completely lost.

Picking it up means being able to see and identify objects, understand commands, recognise that the “it” in question is the tool you put down, go back in time to remember the moment when you put down the tool, and distinguish the tool you put down from other ones of similar shapes and sizes.

Researchers from MIT have gotten closer to making this type of request easier.

They have developed an Alexa-like system called “ComText” — for “commands in context” — that allows to understand a wide range of commands that require contextual knowledge about objects and their environments.

“Where humans understand the world as a collection of objects and and abstract concepts, view it as pixels, point-clouds, and generated from sensors,” said Rohan Paul, one of the lead authors of the paper. “This semantic gap means that, for to understand what we want them to do, they need a much richer representation of what we do and say.” 

The team tested on a two-armed humanoid robot Baxter. can observe a range of visuals and natural to learn about an object’s size, shape, position, type and even if it belongs to somebody. From this knowledge base, it can then reason, infer meaning and respond to commands.

“The main contribution is this idea that should have different kinds of memory, just like people,” said Andrei Barbu, the project’s co-lead.

With ComText, Baxter was successful in executing the right command about 90 per cent of the time.

In the future, the team hopes to enable to understand more complicated information, such as multi-step commands, the intent of actions, and using properties about objects to interact with them more naturally.

By creating much less constrained interactions, this line of research could enable better communications for a range of robotic systems, from self-driving cars to household helpers.

“This work is a nice step towards building that can interact much more naturally with people,” said Luke Zettlemoyer, an associate professor at the University of in the US, who was not involved in the research.

“In particular, it will help better understand the names that are used to identify objects in the world, and interpret instructions that use those names to better do what users ask,” Zettlemoyer said.

Today’s Robots
  • Currently, are very limited in what they can do
  • For example, if you put a specific tool in a toolbox and ask a robot to “pick it up,” it would be completely lost
  • MIT researchers developed a system called “ComText” that allows to understand a wide range of commands 
  • The team tested on a two-armed humanoid robot called Baxter
  • can observe a range of visuals and natural to learn about an object’s size, shape, position, type and even if it belongs to somebody
  • From this knowledge base, it can then reason, infer meaning and respond to commands

First Published: Tue, September 05 2017. 00:30 IST