Skip to main content

Google’s AI just got ears

The Google Gemini AI logo.
Google

AI chatbots are already capable of “seeing” the world through images and video. But now, Google has announced audio-to-speech functionalities as part of its latest update to Gemini Pro. In Gemini 1.5 Pro, the chatbot can now “hear” audio files uploaded into its system and then extract the text information.

The company has made this LLM version available as a public preview on its Vertex AI development platform. This will allow more enterprise-focused users to experiment with the feature and expand its base after a more private rollout was done in February when the model was first announced. This was originally offered only to a limited group of developers and enterprise customers.

1. Breaking down + understanding a long video

I uploaded the entire NBA dunk contest from last night and asked which dunk had the highest score.

Gemini 1.5 was incredibly able to find the specific perfect 50 dunk and details from just its long context video understanding! pic.twitter.com/01iUfqfiAO

— Rowan Cheung (@rowancheung) February 18, 2024

Google shared the details about the update at its Cloud Next conference, which is currently taking place in Las Vegas. After calling the Gemini Ultra LLM that powers its Gemini Advanced chatbot the most powerful model of its Gemini family, Google is now calling Gemini 1.5 Pro its most capable generative model. The company added that this version is better at learning without additional tweaking of the model.

Gemini 1.5 Pro is multimodal in that it can interpret different types of audio into text, including TV shows, movies, radio broadcasts, conference call recordings. It’s even multilingual in that it can process audio in several different languages. The LLM may also have the ability to create transcripts from videos; however, its quality may be unreliable, as mentioned by TechCrunch.

When first announced Google explained that Gemini 1.5 Pro used a token system to process raw data, where a million tokens equates to approximately 700,000 words or 30,000 lines of code. In media form, it equals an hour of video or around 11 hours of audio.

There have been some private preview demos of Gemini 1.5 Pro that demonstrate how the LLM is able to find specific moments in a video transcript. For example, AI enthusiast Rowan Cheung got early access and detailed how his demo found an exact action shot in a sports contest and summarized the event, as seen in the tweet embedded above.

However, Google noted that other early adopters, including United Wholesale Mortgage, TBS, and Replit are opting for more enterprise-focused use cases, such as mortgage underwriting, automating metadata tagging, and generating, explaining, and updating code.

Editors' Recommendations

Fionna Agomuoh
Fionna Agomuoh is a technology journalist with over a decade of experience writing about various consumer electronics topics…
Google brings AI to every text field on the internet
AI features see in a graphic for Google Chrome.

Tired of hearing about AI? Well, get ready. Google is now adding generative AI built right into its Chrome web browser.

In a new announcement, the company revealed that Chrome is set to receive three new additions that will leverage artificial intelligence to simplify tab organization, enable personalized theming, and, most significantly, even assist users in drafting content on the web anywhere an empty text field exists.
AI-powered writing assistance

Read more
We just got some disappointing news about the Vision Pro
A man wearing the Vision Pro using floating keyboard while looking at virtual screens.

In its initial demo videos, Apple showed people using the Vision Pro with multiple floating screens and a virtual keyboard laid out in front of them. That effectively made it feel like a proper computer -- a virtual Mac, so to speak.

However, we’ve now just learned that the floating keyboard could be totally absent on launch day, removing a major component of the whole setup. That’s what the latest report from Bloomberg insider Mark Gurman claims, at least. Let's dig in.
Not ready for prime time

Read more
The most common Google Meet problems and how to fix them
asus chromebook c523 amazon deals lifestyle

If it's not Slack or Teams, it's likely Google Meet. In today's post pandemic world, you're likely going to be using one of these programs for your next interview, office meeting, or big grant proposal. Getting prepared for these events today requires a lot more than dusting off a suit and picking out the office-suitable makeup. If you're using Google Meet, for example, you'll also want to pick out a professional 360-degree background and check to see that your camera and mic are working appropriately.

What if you do run into problems during your pre-interview checkup; or even worse, during the meeting? We've compiled a list of the most common Google Meet problems to get you looking smooth, professional, and ready to command the appropriate attention you deserve the next time you use Google Meet.

Read more