NewsBytes
    Hindi Tamil Telugu
    More
    In the news
    Narendra Modi
    Amit Shah
    Box Office Collection
    Bharatiya Janata Party (BJP)
    OTT releases
    Hindi Tamil Telugu
    NewsBytes
    User Placeholder

    Hi,

    Logout


    India Business World Politics Sports Technology Entertainment Auto Lifestyle Inspirational Career Bengaluru Delhi Mumbai Visual Stories Find Cricket Statistics Phones Reviews Fitness Bands Reviews Speakers Reviews

    Download Android App

    Follow us on
    • Facebook
    • Twitter
    • Linkedin
     
    Home / News / Technology News / Scale AI to develop testing framework for Pentagon's LLMs
    Scale AI to develop testing framework for Pentagon's LLMs
    It is a 1-year contract

    Scale AI to develop testing framework for Pentagon's LLMs

    By Dwaipayan Roy
    Feb 21, 2024
    03:52 pm
    What's the story

    The Pentagon's Chief Digital and Artificial Intelligence Office (CDAO) has teamed up with Scale AI, a San Francisco-based company. Together, they will develop a reliable testing and evaluation (T&E) framework for large language models (LLMs). These LLMs could play a significant role in military planning and decision-making. The one-year contract aims to create a comprehensive T&E system for generative AI inside the Defense Department, ensuring its safe deployment by measuring model performance and providing real-time feedback for warfighters.

    Next Article
    Goal

    Addressing the complexities of generative AI testing

    Generative AI, which includes LLMs that can produce text, images, software code, and other media based on human prompts, poses unique challenges for T&E processes. Unlike traditional systems with established safety standards, generative AI lacks universally accepted guidelines. To address these complexities, Scale AI will develop "holdout datasets" with the help of Department of Defense (DOD) insiders who can provide response pairs and review them through multiple layers.

    Process

    Iterative process to refine datasets and evaluate models

    The T&E process for LLMs will be iterative, involving the creation and refinement of datasets relevant to the DOD's needs. Experts will then evaluate existing LLMs against these datasets. As holdout datasets are established, evaluations can be conducted to develop model cards—short documents detailing the best use context and performance measurement information for various machine learning models. This approach will help establish a baseline understanding of model performance, strengths, and limitations.

    Aim

    Automating model evaluation and feedback

    The development process aims to automate as much as possible, allowing for quick assessments of new models as they emerge. The goal is for models to provide signals to CDAO officials when they deviate from the domains they have been tested against. Scale AI's statement explains that this work will allow the DOD to mature its T&E policies for generative AI by "measuring and assessing quantitative data" through benchmarking and gathering qualitative feedback from users.

    Partners

    Collaboration with industry leaders

    Scale AI has previously partnered with Microsoft, Meta, OpenAI, the US Army, the Defense Innovation Unit, General Motors, and NVIDIA. Alexandr Wang, Scale AI's CEO, said in a statement, "Testing and evaluating generative AI will help the DoD understand the strengths and limitations of the technology, so it can be deployed responsibly." This partnership aims to increase the resilience and robustness of AI systems in classified environments. This will ensure LLM technology adoption "in secure settings."

    Facebook
    Whatsapp
    Twitter
    Linkedin
    Related News
    Latest
    United States of America
    Pentagon
    Artificial Intelligence and Machine Learning

    Latest

    Ladakh hosts world's highest frozen lake marathon Ladakh
    Mitchell Marsh smokes his third T20I fifty as Australian captain Mitchell Marsh
    Dating to divorce: Looking at Kevin Costner, Christine Baumgartner's relationship Hollywood
    21st round of India-China Corps Commander talks held, no breakthrough India

    United States of America

    Mercedes-Benz recalls over 12,000 vehicles in the US: Here's why Mercedes-Benz
    US elementary schoolers hospitalized after dry ice experiment goes awry World
    Meet Ashwin Ramaswami, Indian-American Gen Z vying for Senate seat Georgia
    US likely to veto UNSC vote on Gaza ceasefire United Nations Security Council

    Pentagon

    US strikes 85 Iran-linked targets in Iraq, Syria Joe Biden
    OpenAI to make cybersecurity tools for US military  OpenAI
    US developing nuclear weapon capable of killing 300,000 people: Report United States of America
    China continues to develop military infrastructure along LAC: Pentagon China

    Artificial Intelligence and Machine Learning

    Adobe Acrobat introduces AI assistant in beta: How it works Adobe
    Microsoft is developing AI server gear to reduce NVIDIA dependence Microsoft
    OPPO, Meizu embrace AI integration in their handsets: Here's why OPPO
    Meta to launch deepfake fact-checking helpline on WhatsApp next month Meta
    Next Article
    Indian Premier League (IPL) Celebrity Hollywood Bollywood UEFA Champions League Tennis Football Smartphones Cryptocurrency Upcoming Movies Premier League Cricket News Latest automobiles Latest Cars Upcoming Cars Latest Bikes Upcoming Tablets
    About Us Privacy Policy Terms & Conditions Contact Us Ethical Conduct Grievance Redressal News News Archive Topics Archive Download DevBytes Find Cricket Statistics
    Follow us on
    Facebook Twitter Linkedin
    All rights reserved © NewsBytes 2024