NewsBytes
    Hindi Tamil Telugu
    More
    In the news
    Narendra Modi
    Amit Shah
    Box Office Collection
    Bharatiya Janata Party (BJP)
    OTT releases
    Hindi Tamil Telugu
    NewsBytes
    User Placeholder

    Hi,

    Logout


    India Business World Politics Sports Technology Entertainment Auto Lifestyle Inspirational Career Bengaluru Delhi Mumbai Visual Stories Find Cricket Statistics Phones Reviews Fitness Bands Reviews Speakers Reviews

    Download Android App

    Follow us on
    • Facebook
    • Twitter
    • Linkedin
     
    Home / News / Technology News / OpenAI utilized millions of YouTube videos to train GPT-4: Report
    Next Article
    OpenAI utilized millions of YouTube videos to train GPT-4: Report
    The company may have infringed upon YouTube creators' copyrights

    OpenAI utilized millions of YouTube videos to train GPT-4: Report

    By Akash Pandey
    Apr 07, 2024
    10:37 am
    What's the story

    OpenAI has utilized transcriptions from more than a million hours of YouTube videos, to improve its advanced language model, GPT-4, as per The New York Times. As per the report, OpenAI's President, Greg Brockman, was directly involved in the selection of training videos. Despite being aware of the potential legal implications, OpenAI, desperate for training data, considered this action as fair use.

    Dataset creation

    Approaches to enhance AI understanding

    In an email to The Verge, OpenAI's Spokesperson, Lindsay Held, communicated that the company creates "unique" datasets for each model to "help their understanding of the world" and maintain its global research competitiveness. She further elaborated that OpenAI uses "numerous sources including publicly available data and partnerships for non-public data." The company is also contemplating the creation of its own synthetic data, Held stated.

    Data dilemma

    Exploration of new data sources for AI training

    The NYT report also mentions that OpenAI had exhausted useful data sources by 2021. The company had trained its models using data such as computer code from Github, chess move databases, as well as educational content from Quizlet. After other resources were depleted, it considered using transcriptions from YouTube videos, podcasts, and audiobooks. OpenAI is consistently sourcing data to improve its AI models.

    Google's stance

    Google responds to OpenAI's YouTube transcript usage

    Google's representative, Matt Bryant, stated via email to The Verge that the company has "seen unconfirmed reports" about OpenAI's use of YouTube transcripts. He said that Google's guidelines prohibit unauthorized scraping or downloading of YouTube content. YouTube CEO Neal Mohan made similar comments this week regarding OpenAI's potential use of YouTube data to train its Sora video-generating model. Bryant also highlighted that Google enforces "technical and legal measures" to prevent unauthorized usage when there's a clear legal or technical justification.

    Information

    Google's use of YouTube transcripts for AI training

    NYT also revealed that Google itself used YouTube transcripts to train its AI models. Bryant also confirmed this but clarified that only content from creators who had given their consent was used, highlighting the approach taken by the company in training its AI models.

    Facebook
    Whatsapp
    Twitter
    Linkedin
    Related News
    Latest
    OpenAI
    Google
    YouTube
    Artificial Intelligence and Machine Learning

    Latest

    HDFC Bank and LIC led market cap boost last week HDFC Bank
    10 arrested after migrant worker from Arunachal lynched to death Arunachal Pradesh
    IPL 2024: Bruised CSK host high-flying KKR in Chennai Indian Premier League (IPL)
    Maruti Suzuki aims high with export goals for FY25 Maruti Suzuki

    OpenAI

    Sam Altman and ex-Apple designer team up for AI device Sam Altman
    OpenAI's new tools simplify building custom AI models Artificial Intelligence and Machine Learning
    YouTube expresses concern over OpenAI's video training approach for Sora Google
    DALL-E images are now editable within ChatGPT: Here's how ChatGPT

    Google

    Squarespace begins management of domains acquired from Google last year Technology
    Samsung Galaxy Watch FE in the works: What we know Samsung
    Google parent considers acquiring marketing software firm HubSpot Morgan Stanley
    Google Pixel 8a to cost around $500; full specs leaked Google Pixel 8

    YouTube

    Samsung highlights Galaxy S24 Ultra's unique Gorilla Glass Armor protection Samsung
    George Carlin's daughter warns of AI threat after settling lawsuit Artificial Intelligence and Machine Learning
    Deceptive 'slippage bot' cryptocurrency scam is circulating on YouTube ChatGPT
    YouTube purged 9 million videos in Q4 2023: Here's why Google

    Artificial Intelligence and Machine Learning

    X expands Grok AI chatbot access to Premium subscribers X
    AI camera creates nudes from real pictures, sparking ethical concerns Technology
    China will use AI to disrupt US, India elections: Microsoft China
    India to ready AI legislation post-elections: IT minister Ashwini Vaishnaw Ashwini Vaishnaw
    Next Article
    Indian Premier League (IPL) Celebrity Hollywood Bollywood UEFA Champions League Tennis Football Smartphones Cryptocurrency Upcoming Movies Premier League Cricket News Latest automobiles Latest Cars Upcoming Cars Latest Bikes Upcoming Tablets
    About Us Privacy Policy Terms & Conditions Contact Us Ethical Conduct Grievance Redressal News News Archive Topics Archive Download DevBytes Find Cricket Statistics
    Follow us on
    Facebook Twitter Linkedin
    All rights reserved © NewsBytes 2024