NewsBytes
    Hindi Tamil Telugu
    More
    In the news
    Narendra Modi
    Amit Shah
    Box Office Collection
    Bharatiya Janata Party (BJP)
    OTT releases
    Hindi Tamil Telugu
    NewsBytes
    User Placeholder

    Hi,

    Logout


    India Business World Politics Sports Technology Entertainment Auto Lifestyle Inspirational Career Bengaluru Delhi Mumbai Visual Stories Find Cricket Statistics Phones Reviews Fitness Bands Reviews Speakers Reviews

    Download Android App

    Follow us on
    • Facebook
    • Twitter
    • Linkedin
     
    Home / News / Technology News / Google's Gemini API introduces context caching to optimize AI workflows
    In brief
    Simplifying... Inbrief
    • Google's Gemini API now features context caching, a tool that optimizes AI workflows by storing input tokens for future use.
    • This not only speeds up processing times but also reduces operational costs.
    • Developers can control how long these tokens are stored, balancing cost and efficiency, and the feature is available for both Gemini 1.5 Pro and Flash models.
    Was a long read? Making it simpler...
    Next Article
    Next Article
    Google's Gemini API introduces context caching to optimize AI workflows
    It offers substantial cost savings

    Google's Gemini API introduces context caching to optimize AI workflows

    By Dwaipayan Roy
    Jun 20, 2024
    11:50 am
    What's the story

    Google's Gemini API, a vital tool for AI developers, has recently launched a new facility called context caching. This innovative feature is aimed at streamlining AI workflows and lowering operational costs, by allowing developers to store frequently used input tokens in a dedicated cache. These tokens can then be referenced for requests subsequently, eliminating the requirement to repeatedly pass the same set of tokens to a model.

    Benefits

    A cost-effective solution for AI workflows

    Context caching offers several significant benefits, including substantial cost savings. In standard AI workflows, developers often have to pass the same input tokens multiple times to a model, which can be expensive, especially when dealing with large volumes of data. By caching these tokens once and referring to them as per requirement, developers can lower the number of tokens sent to the model, thereby lowering the overall operational costs.

    Workflow optimization

    Enhanced performance and efficiency

    Context caching can also enhance latency and performance. When input tokens are cached, subsequent requests which reference those tokens can be processed faster, as the model does not need to process the same tokens repeatedly. This results in faster response times and a more efficient AI workflow, especially when dealing with complex and data-intensive tasks. Context caching is highly beneficial in scenarios, where a substantial initial context is referenced repeatedly by shorter requests.

    Developer control

    Fine-grained control over caching mechanism

    The process of context caching in the Gemini API is straightforward, and allows developers fine-tuned control over the caching mechanism. Developers can choose how long they want the cached tokens to persist before being automatically deleted. This duration is known as the time to live (TTL). The TTL plays a crucial role in determining the cost of caching; longer TTLs result in higher costs as cached tokens occupy storage space for extended periods.

    Cost management

    Balancing token count and caching costs

    The price of caching also depends on the size of the input tokens that are being cached. The Gemini API charges depending on the number of tokens stored in the cache, so developers have to be mindful of the token count when deciding what content to cache. Striking a balance between caching frequently used tokens and avoiding unnecessary caching of rarely accessed content is essential.

    Usage

    Context caching support and utilization

    Gemini API supports context caching for Gemini 1.5 Pro as well as Gemini 1.5 Flash models. This offers flexibility for developers working with different model variants. To use context caching, developers need to install a Gemini SDK, and configure an API key. The process involves uploading the content to be cached, making a cache with a specified TTL, and generating a Generative Model that uses the created cache.

    Facebook
    Whatsapp
    Twitter
    Linkedin
    Related News
    Latest
    Google
    Google Gemini
    Artificial Intelligence and Machine Learning

    Latest

    James Webb telescope spots evidence of gigantic asteroid collision James Webb Space Telescope
    Priyanka Chopra Jonas's former restaurant Sona announces abrupt closure Nick Jonas
    Patna HC scraps Bihar government quota hike to 65%  Nitish Kumar
    37 die after drinking spurious liquor in Tamil Nadu MK Stalin

    Google

    Over 1,100 students boycott Google, Amazon amid Project Nimbus controversy Amazon
    Android 15 achieves platform stability with 3rd beta release Android 15
    Google DeepMind's new AI tool creates soundtracks using text prompts DeepMind
    Users worried as AI-generated images dominate Google Search results Artificial Intelligence and Machine Learning

    Google Gemini

    Google's AI-backed Gemini app now supports 9 Indian languages Google
    Opera finally incorporates Google's Gemini AI into its browser Google
    Google Messages for Android gets Gemini-powered features Google
    Google Gemini now lets you control YouTube Music using commands Google

    Artificial Intelligence and Machine Learning

    Snapchat's on-device AI model changes user background, clothing in real-time Snapchat
    Meta AI lifts restrictions on election-related queries in India Meta
    'Euphoria' actor Jacob Elordi targeted with sexually explicit deepfakes Celebrity
    Ex-Snap engineer launches Butterflies, a social network for AI-human interaction Startups
    Next Article

    Live

    Indian Premier League (IPL) Celebrity Hollywood Bollywood UEFA Champions League Tennis Football Smartphones Cryptocurrency Upcoming Movies Premier League Cricket News Latest automobiles Latest Cars Upcoming Cars Latest Bikes Upcoming Tablets
    About Us Privacy Policy Terms & Conditions Contact Us Ethical Conduct Grievance Redressal News News Archive Topics Archive Download DevBytes Find Cricket Statistics
    Follow us on
    Facebook Twitter Linkedin
    All rights reserved © NewsBytes 2024
    filled star
    half filled star