Skip to main content

Gemini AI is making robots in the office far more useful

An Everyday Robot navigating through an office.
Everyday Robot

Lost in an unfamiliar office building, big box store, or warehouse? Just ask the nearest robot for directions.

A team of Google researchers combined the powers of natural language processing and computer vision to develop a novel means of robotic navigation as part of a new study published Wednesday.

Essentially, the team set out to teach a robot — in this case an Everyday Robot — how to navigate through an indoor space using natural language prompts and visual inputs. Robotic navigation used to require researchers to not only map out the environment ahead of time but also provide specific physical coordinates within the space to guide the machine. Recent advances in what’s known as Vision Language navigation have enabled users to simply give robots natural language commands, like “go to the workbench.” Google’s researchers are taking that concept a step further by incorporating multimodal capabilities, so that the robot can accept natural language and image instructions at the same time.

For example, a user in a warehouse would be able to show the robot an item and ask, “what shelf does this go on?” Leveraging the power of Gemini 1.5 Pro, the AI interprets both the spoken question and the visual information to formulate not just a response but also a navigation path to lead the user to the correct spot on the warehouse floor. The robots were also tested with commands like, “Take me to the conference room with the double doors,” “Where can I borrow some hand sanitizer,” and “I want to store something out of sight from public eyes. Where should I go?”

Or, in the Instagram Reel above, a researcher activates the system with an “OK robot” before asking to be led somewhere where “he can draw.” The robot responds with “give me a minute. Thinking with Gemini …” before setting off briskly through the 9,000-square-foot DeepMind office in search of a large wall-mounted whiteboard.

To be fair, these trailblazing robots were already familiar with the office space’s layout. The team utilized a technique known as “Multimodal Instruction Navigation with demonstration Tours (MINT).” This involved the team first manually guiding the robot around the office, pointing out specific areas and features using natural language, though the same effect can be achieved by simply recording a video of the space using a smartphone. From there the AI generates a topological graph where it works to match what its cameras are seeing with the “goal frame” from the demonstration video.

Then, the team employs a hierarchical Vision-Language-Action (VLA) navigation policy “combining the environment understanding and common sense reasoning,” to instruct the AI on how to translate user requests into navigational action.

The results were very successful with the robots achieving “86 percent and 90 percent end-to-end success rates on previously infeasible navigation tasks involving complex reasoning and multimodal user instructions in a large real world environment,” the researchers wrote.

However, they recognize that there is still room for improvement, pointing out that the robot cannot (yet) autonomously perform its own demonstration tour and noting that the AI’s ungainly inference time (how long it takes to formulate a response) of 10 to 30 seconds turns interacting with the system a study in patience.

Andrew Tarantola
Andrew has spent more than a decade reporting on emerging technologies ranging from robotics and machine learning to space…
This Dell G15 gaming laptop with RTX 4050 is on sale for $850
Dell G15 (2023) sitting on a coffee table.

You don't need to shell out thousands of dollars to buy a gaming laptop. There are budget-friendly options like the Dell G15, which is actually even cheaper right now following a $200 discount. Its configuration that features the Nvidia GeForce RTX 4050 graphics card, which usually costs $1,050 from Dell, is down to only $850. That's a pretty affordable price for a dependable gaming laptop, but you have to act fast if you're interested because the stock on sale may run out at any moment.

Why you should buy the Dell G15 gaming laptop
Most of the best gaming laptops will cost you a pretty penny, but we also have the Dell G15 on our list as the best budget gaming laptop for those who want an affordable but capable device to run the best PC games. With the AMD Ryzen 7 7840HS processor, the Nvidia GeForce RTX 4050 GPU, and 16GB of RAM, you'll be able to play at low to medium graphics settings -- the Dell G15 won't have enough juice to go with the highest settings, but that's a fine trade-off for a gaming laptop that you can get for this cheap.

Read more
This Dell workstation laptop is on clearance — over $900 off
The Dell Precision 5480 Workstation laptop on a white background.

If you're looking for a laptop that will be able to keep up with all of your demanding tasks for work or school, you're going to want to check out the Dell Precision 5480 Workstation. It's currently available in a clearance sale from Dell with a $930 discount that slashes its price from $2,739 to $1,809. It's still not cheap, but with the performance that the laptop promises, it will be worth it. There's probably not much time left before stocks run out, so if you want this machine as your companion, you're going to have to complete your purchase for it as soon as possible.

Why you should buy the Dell Precision 5480 Workstation
The Dell Precision 5480 Workstation is a portable but powerful laptop that will let you finish your daily tasks quickly and efficiently -- even the more complicated ones. It's equipped with the 13th-generation Intel Core i7 processor and the Nvidia RTX 2000 Ada graphics card, which are supported by 32GB of RAM that's the sweet spot for professionals, according to our guide on how much RAM do you need. These components allow it to challenge the best laptops in terms of performance, making it perfect for both employees and students.

Read more