AI

DeepMind’s new AI generates soundtracks and dialogue for videos

Comment

blue circle, yin yang
Image Credits: Google DeepMind

DeepMind, Google’s AI research lab, says it’s developing AI tech to generate soundtracks for videos.

In a post on its official blog, DeepMind says that it sees the tech, V2A (short for “video-to-audio”), as an essential piece of the AI-generated media puzzle. While plenty of orgs including DeepMind have developed video-generating AI models, these models can’t create sound effects to sync with the videos they generate.

“Video generation models are advancing at an incredible pace, but many current systems can only generate silent output,” DeepMind writes. “V2A technology [could] become a promising approach for bringing generated movies to life.”

DeepMind’s V2A tech takes a description of a soundtrack (e.g. “jellyfish pulsating under water, marine life, ocean”) paired with a video to create music, sound effects and even dialogue that matches the characters and tone of the video, watermarked by DeepMind’s deepfake-combatting SynthID technology. The AI model powering V2A — a diffusion model — was trained on a combination of sounds and dialogue transcripts as well as video clips, DeepMind says.

“By training on video, audio and the additional annotations, our technology learns to associate specific audio events with various visual scenes, while responding to the information provided in the annotations or transcripts,” DeepMind writes.

Mum’s the word on whether any of the training data was copyrighted — and whether the data’s creators were informed of DeepMind’s work. We’ve reached out to DeepMind for clarification and will update this post if we hear back.

AI-powered sound-generating tools aren’t novel. Startup Stability AI released one just last week, and ElevenLabs launched one in May. Nor are models to create video sound effects. A Microsoft project can generate talking and singing videos from a still image, and platforms like Pika and GenreX have trained models to take a video and make a best guess at what music or effects are appropriate in a given scene.

But DeepMind claims that its V2A tech is unique in that it can understand the raw pixels from a video and sync generated sounds with the video automatically, optionally sans description.

V2A isn’t perfect — and DeepMind acknowledges this. Because the underlying model wasn’t trained on a lot of videos with artifacts or distortions, it doesn’t create particularly high-quality audio for these. And in general, the generated audio isn’t super convincing; my colleague Natasha Lomas described it as “a smorgasbord of stereotypical sounds,” and I can’t say I disagree.

For those reasons — and to prevent misuse — DeepMind says it won’t release the tech to the public anytime soon, if ever.

“To make sure our V2A technology can have a positive impact on the creative community, we’re gathering diverse perspectives and insights from leading creators and filmmakers, and using this valuable feedback to inform our ongoing research and development,” DeepMind writes. “Before we consider opening access to it to the wider public, our V2A technology will undergo rigorous safety assessments and testing.”

DeepMind pitches its V2A technology as an especially useful tool for archivists and folks working with historical footage. But, as I wrote in a piece this morning, generative AI along these lines also threatens to upend the film and TV industry. It’ll take some seriously strong labor protections to ensure that generative media tools don’t eliminate jobs — or, as the case may be, entire professions.

More TechCrunch

DeepMind, Google’s AI research lab, says it’s developing AI tech to generate soundtracks for videos. In a post on its official blog, DeepMind says that it sees the tech, V2A…

DeepMind’s new AI generates soundtracks and dialogue for videos
Image Credits: Google DeepMind

In the complaint filed on Monday, the DOJ wrote that “Adobe has harmed consumers by enrolling them in its default, most lucrative subscription plan without clearly disclosing important plan terms.”

US sues Adobe for hiding termination fees and making it difficult to cancel subscriptions

Perplexity could already fetch this data from the web and display results in a descriptive way, but the company is adding some visual flair to these results to make them…

Perplexity now displays results for temperature, currency conversion and simple math, so you don’t have to use Google

Finaloop’s solution is a platform that uses automation in the background to track transactions covering three different functions in one.

From sperm freezing to accounting tools: Finaloop’s founder scores $35M to solve e-commerce retailers’ bookkeeping headaches

The race to high-quality, AI-generated videos is heating up. On Monday, Runway, a company building generative AI tools geared toward film and image content creators, unveiled Gen-3 Alpha. The company’s latest…

Runway’s new video-generating AI, Gen-3, offers improved controls

YouTube is introducing a new experimental feature that will allow viewers to add “Notes” to provide more context and information under videos, the company told TechCrunch exclusively. If the feature…

YouTube is experimenting with Notes, a crowdsourced feature that lets users add context to videos

Its technology involves spinning plant protein fibers, including soy proteins, similar to the way cotton candy is made, to create structured cuts of meat.

Plant-based ‘meat’ startup Tender has already nabbed a fast-food chain contract, and another $11M

TikTok announced on Monday that it’s introducing generative AI avatars of creators and stock actors for branded content and ads on its platform. The company is also launching an “AI…

TikTok ads will now include AI avatars of creators and stock actors

With the advent of generative AI, AI applications are transforming and reshaping various industries and changing how people work. Software development is no exception. San Francisco- and Tokyo-based startup Autify…

Autify launches Zenes, an AI agent for software quality assurance

Today’s the day, startup founders. It’s your final opportunity to join Startup Battlefield 200, the world’s preeminent startup competition, at TechCrunch Disrupt in October. It’s your last chance to launch…

Today’s your last chance to apply for the Startup Battlefield 200

A controversial European Union legislative proposal to scan the private messages of citizens in a bid to detect child sexual abuse material (CSAM) is a risk to the future of…

Stop playing games with online security, Signal president warns EU lawmakers

After announcing a strategic partnership with SoftBank in April, Perplexity — the AI search engine that has ambitions to take on Google — is now using the deal to expand…

Perplexity AI searches for users in Japan, via SoftBank deal

When Manny Griffiths worked with a personal injury lawyer after his wife’s car accident, he was surprised by the lack of information and communication from their lawyer regarding their claim.…

YC-backed Hona looks to reduce the communication friction between law firms and their consumer clients

Featured Article

Privacy app maker Proton transitions to nonprofit foundation structure

The newly setup Proton Foundation will serve as the main shareholder to the existing corporate entity that is Proton AG, which will continue as a for-profit company under the auspices of the Foundation.

7 hours ago
Privacy app maker Proton transitions to nonprofit foundation structure

Tinybird is not so tiny anymore. The enterprise data startup TechCrunch first covered three years ago has been growing at a rapid pace and recently raised a $30 million Series…

Tinybird raises another $30 million to transform data into real-time APIs

Oyo, the Indian budget-hotel chain startup, is finalizing a fresh fundraise of about $100 million to $125 million that slashes its valuation to $2.5 billion, two people familiar with the…

India’s Oyo, once valued at $10B, finalizes new funding at $2.5B valuation

Featured Article

Black founders are creating tailored ChatGPTs for a more personalized experience

ChatGPT, one of the world’s most powerful artificial intelligence tools, struggles with cultural nuance.

1 day ago
Black founders are creating tailored ChatGPTs for a more personalized experience

Holy procrastination, startup founders! Tomorrow’s your last chance to apply to the Startup Battlefield 200 at TechCrunch Disrupt 2024. Your last chance for a shot to stand on the Disrupt…

Startup Battlefield 200 applications close tomorrow

The Clicks keyboard case has arrived, and it’s delightful, if not entirely practical for everyday use — at least, not without weeks of practice. 

Hands-on with the BlackBerry-style Clicks keyboard for iPhone

YouTube continues its efforts to circumvent ad blockers. Earlier this week, ad blocker SponsorBlock posted that the Google-owned video service is testing out server-side ad injection with a limited number…

YouTube is testing another way to combat ad blockers

Care/of, a company offering personalized subscription vitamin packs, says it will be canceling all subscriptions as of Monday, June 17 and will no longer be accepting new orders. The news…

Subscription vitamin company Care/of is shutting down

Welcome back to TechCrunch’s Week in Review — TechCrunch’s newsletter recapping the week’s biggest news. Want it in your inbox every Saturday? Sign up here. Apple’s Worldwide Developers Conference had…

Apple ushers in a new era with Apple Intelligence

No one knows what AI looks like, or even is supposed to look like. It does everything, but looks like nothing.

Apple joins the race to find an AI icon that makes sense

There has been a lot of bad news about social media startups lately. Multiple companies, including Twitter alternative Post News, and IRL have shut down. And ShareChat’s valuation has dropped…

Deal Dive: BeReal got its best-case scenario exit

OpenAI offers an array of plans for ChatGPT, both paid and free.

How much does ChatGPT cost? Everything you need to know about OpenAI’s pricing plans

FTC Chair Lina Khan was the youngest person appointed to her position when she assumed the job in 2021. But once her term ends in September —  after which she’ll…

FTC Chair Lina Khan on startups, scaling, and ”innovations in potential lawbreaking”

Satellite imagery startup Albedo is preparing for its up-close-and-personal debut. Albedo’s first satellite will take to orbit next spring as the company looks to turn the commercial Earth observation industry…

Albedo takes Earth observation up close and personal from very low Earth orbit

Tempus, a genomic testing and data analysis company started by Eric Lefkofsky, who previously founded Groupon, debuted on Nasdaq on Friday, rising about 15% on the opening.  The company priced…

Tempus rises 9% on the first day of trading, demonstrating investor appetite for a health tech with a promise of AI

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the…

3 days ago
A comprehensive list of 2024 tech layoffs

Welcome to Startups Weekly — Haje’s weekly recap of everything you can’t miss from the world of startups. Sign up here to get it in your inbox every Friday. Hold…

Startups Weekly: Clash of the AI titans, and Europe is firing on all cylinders