Video Q&A LLM (Gemini)
Author: Youngin Kim
Design: Teddy
Peer Review :
Proofread : frimer
This is a part of LangChain Open Tutorial
Overview
This tutorial demonstrates how to use the Gemini API to process and analyze video content.
Specifically, it shows how to upload a video file using File API, and then use a generative model to extract descriptive information about the video.
The workflow utilizes the gemini-1.5-flash model to generate a text-based description of a given video clip.
Additionally, it provides an example of Integrating the Gemini Model into a LangChain Workflow for Video Data, showcasing how to build a chain that processes and analyzes video content seamlessly within the LangChain framework.
Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorialfor more details.
API KEY Issuance
Obtain an API KEY from the link.
Important:
The
File APIused in this tutorial requiresAPI keysfor authentication and access.Uploaded files are linked to the cloud project associated with the
API key.
Unlike other Gemini APIs, the API key also grants access to data uploaded via the File API, so it's crucial to store the API key securely.
Data preparation
license free video from pexels
author: SwissHumanity Stories
Please download the video and copy it to the ./data folder for the tutorial
Upload and Preprocess video using Gemini API
Next, use the File API to upload the video file.
After uploading the file, you can call get_file to verify if the API has successfully processed the file.
get_file allows you to check the uploaded file associated with the File API in the cloud project linked to the API key.
Generate content (Gemini API)
After the video is preprocessed, you can use the generate_content function from Gemini API to request questions about the video.
Below is an example of stream output (with the stream=True option added).
Integrating the Gemini Model into a LangChain Workflow for Video Data
Here is an example of using LangChain with the Gemini model.
The model is loaded via ChatGoogleGenerativeAI from langchain_google_genai, allowing multimodal data to be included in the content of HumanMessage using the media type.
File Deletion
Files are automatically deleted after 2 days, or you can manually delete them using files.delete().
Last updated