Neal Mohan, CEO at YouTube, doesn’t know whether OpenAI has used the company’s videos to train its AI models like Sora, but wants to make it perfectly clear that doing so would be a “clear violation” of YouTube’s terms of use.
When asked during an interview by Emily Chang, host of Bloomberg Originals,
“From a creator’s perspective, when a creator uploads their hard work to our platform, they have certain expectations,” Mohan said. “One of those expectations is that the terms of service is going to be abided by. Our terms of service allows for some YouTube content to be the title of the video, channel name or creator’s name to be scraped, because that’s how you allow the content to show up in other search engines.”
advertisement
advertisement
It does not allow things like transcripts or video bits, which is a clear violation of YouTube’s terms of service, he said, adding that those are the rules in terms of content on the platform.
The debate rages on after OpenAI CTO Mira Murati expressed uncertainty about whether Sora, the company's text-to-video AI tool, was trained on user-generated content from platforms like YouTube.
And after Reddit and Google disclosed a deal. That deal includes the ability for Google to use Reddit’s social media data for a variety of reasons, including the ability to train AI models.
Most AI tools are trained with publicly available data similar to the way search engine crawlers scrape data from across the web.
Mohen said YouTube works to protect creators by ensuring all follow the core terms of service. It’s ultimate making creators successful on the platform and building “magical experiences for viewers.”
AI companies training their large language models using creative work without compensation or permission is not new. The New York Times initiated a lawsuit against AI creators Microsoft and OpenAI, alleging they used its copyrighted work to train its AI models.