This tutorial demonstrates how to use the Gemini API to process and analyze video content.
Specifically, it shows how to upload a video file using File API, and then use a generative model to extract descriptive information about the video.
The workflow utilizes the gemini-1.5-flash model to generate a text-based description of a given video clip.
Additionally, it provides an example of Integrating the Gemini Model into a LangChain Workflow for Video Data, showcasing how to build a chain that processes and analyzes video content seamlessly within the LangChain framework.
After uploading the file, you can call get_file to verify if the API has successfully processed the file.
get_file allows you to check the uploaded file associated with the File API in the cloud project linked to the API key.
import time
# Videos need to be processed before you can use them.
while video_file.state.name == "PROCESSING":
print("Please wait while the video upload and preprocessing are completed...")
time.sleep(5)
video_file = genai.get_file(video_file.name)
# Raise an exception if the processing fails
if video_file.state.name == "FAILED":
raise ValueError(video_file.state.name)
# Print completion message
print(
f"\nVideo processing is complete!\nYou can now start the conversation: {video_file.uri}"
)
Please wait while the video upload and preprocessing are completed...
Video processing is complete!
You can now start the conversation: https://generativelanguage.googleapis.com/v1beta/files/ycq94nkeb9gd
Generate content (Gemini API)
After the video is preprocessed, you can use the generate_content function from Gemini API to request questions about the video.
# Prompt message
prompt = "Describe this video clip"
# Set model to Gemini 1.5 Flash
model = genai.GenerativeModel(model_name="models/gemini-1.5-flash")
# request response to LLM
response = model.generate_content(
[prompt, video_file], request_options={"timeout": 600}
)
# print response
print(response.text)
Here's a description of the video clip:
The video shows an aerial, high-angle view of a red passenger train traveling along a railway line that runs parallel to a road through a picturesque valley.
Here's a breakdown of the scene:
* **The Train:** A long, red passenger train is the central focus, moving from the bottom to the middle of the frame. It's a fairly modern-looking train.
* **The Valley:** The valley is lush green, with fields dotted with yellow wildflowers (likely dandelions). The grass is vibrant and appears to be well-maintained pastureland. Several farmhouses and buildings are scattered throughout the valley. A small stream or river meanders alongside the road and tracks.
* **The Mountains:** Towering mountains, partially snow-capped, form a dramatic backdrop. The mountains are steep and rocky, showcasing a mix of textures and shades of green and grey.
* **The Atmosphere:** The overall atmosphere is peaceful and idyllic, with clear blue skies and abundant sunlight suggesting a pleasant spring or summer day.
The video appears to be drone footage, smoothly following the train's progress through the valley. The camera angle provides a sweeping perspective that showcases the beauty of the landscape and the integration of the train within the environment. The entire scene evokes a sense of serene beauty and the charm of rural Switzerland.
Below is an example of stream output (with the stream=True option added).
# Prompt message
prompt = "What type of train is shown in this video, and what color is it?"
# Set model to Gemini 1.5 Flash
model = genai.GenerativeModel(model_name="models/gemini-1.5-flash")
# request stream response to LLM
response = model.generate_content(
[prompt, video_file], request_options={"timeout": 600}, stream=True
)
# print stream response
for chunk in response:
print(chunk.text, end="", flush=True)
That's a narrow-gauge railway train. More specifically, it appears to be a type of railcar used on the Appenzell Bahn (AB) in Switzerland. The train is primarily red in color, with some black and white accents.
Integrating the Gemini Model into a LangChain Workflow for Video Data
Here is an example of using LangChain with the Gemini model.
The model is loaded via ChatGoogleGenerativeAI from langchain_google_genai, allowing multimodal data to be included in the content of HumanMessage using the media type.
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage
# Initialize the Gemini model with the specified version
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")
# Create a message to send to the model and attach the video file as media input
message = HumanMessage(
content=[
{"type": "text", "text": "Please analyze the content of this video."},
{
"type": "media",
"mime_type": video_file.mime_type,
"file_uri": video_file.uri,
},
]
)
# Stream the response and process each chunk
for chunk in llm.stream([message]):
print(chunk.content)
This
video shows an aerial view of a red train traveling along a railway line that runs
parallel to a road through a picturesque valley in what appears to be the Swiss Alps
.
Here's a breakdown of the content:
* **Scenery:** The valley is lush green, with fields dotted with yellow wildflowers, likely
dandelions. The valley is surrounded by steep, verdant hillsides and majestic snow-capped mountains in the background. A small stream or river runs
alongside the road and railway. Several farmhouses or chalets are scattered throughout the valley. The overall impression is one of idyllic rural Switzerland.
* **Transportation:** A long red train is the central focus, moving steadily along the
railway tracks. The train appears to be a passenger train, given its length and the typical design. A road runs parallel to the tracks, offering a contrasting mode of transportation.
* **Camera Work:** The video is shot from a
drone, providing a high-angle, sweeping perspective. The camera follows the train as it moves through the valley, giving viewers a sense of the scale and beauty of the landscape. The drone maintains a relatively constant distance and speed to follow the train.
* **Overall Impression:** The video is visually stunning and evokes a
sense of tranquility and the beauty of nature in a mountainous region. It's a perfect example of promotional footage for tourism, showcasing Switzerland's landscapes and transportation infrastructure.
File Deletion
Files are automatically deleted after 2 days, or you can manually delete them using files.delete().
# File deletion
genai.delete_file(video_file.name)
print(f"The video has been deleted: {video_file.uri}")
The video has been deleted: https://generativelanguage.googleapis.com/v1beta/files/ycq94nkeb9gd