Video Q&A LLM (Gemini)

Open in ColabOpen in GitHub

Overview

This tutorial demonstrates how to use the Gemini API to process and analyze video content.

Specifically, it shows how to upload a video file using File API, and then use a generative model to extract descriptive information about the video.

The workflow utilizes the gemini-1.5-flash model to generate a text-based description of a given video clip.

Additionally, it provides an example of Integrating the Gemini Model into a LangChain Workflow for Video Data, showcasing how to build a chain that processes and analyzes video content seamlessly within the LangChain framework.

Table of Contents

References


Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorial for more details.

API KEY Issuance

  • Obtain an API KEY from the link.

Important:

  • The File API used in this tutorial requires API keys for authentication and access.

  • Uploaded files are linked to the cloud project associated with the API key.

Unlike other Gemini APIs, the API key also grants access to data uploaded via the File API, so it's crucial to store the API key securely.

Data preparation

license free video from pexels

Please download the video and copy it to the ./data folder for the tutorial

Upload and Preprocess video using Gemini API

Next, use the File API to upload the video file.

After uploading the file, you can call get_file to verify if the API has successfully processed the file.

get_file allows you to check the uploaded file associated with the File API in the cloud project linked to the API key.

Generate content (Gemini API)

After the video is preprocessed, you can use the generate_content function from Gemini API to request questions about the video.

Below is an example of stream output (with the stream=True option added).

Integrating the Gemini Model into a LangChain Workflow for Video Data

Here is an example of using LangChain with the Gemini model.

The model is loaded via ChatGoogleGenerativeAI from langchain_google_genai, allowing multimodal data to be included in the content of HumanMessage using the media type.

File Deletion

Files are automatically deleted after 2 days, or you can manually delete them using files.delete().

Last updated