Make Report Using RAG, Web searching, Image generation Agent
Author: Junseong Kim
Peer Review:
Proofread : BokyungisaGod
This is a part of LangChain Open Tutorial
Overview
In this tutorial, we showcase how to use three different agents in LangChain to create a comprehensive report. Specifically, we combine:
Web Searching Agent:
Performs web searches (via a custom tool) to gather additional real-time information.
RAG (Retrieval-Augmented Generation) Agent:
Uses a local PDF (e.g., Tesla-related) loaded and chunked into a VectorStore.
Provides relevant context from the PDF using retrieval tools.
Image Generation Agent:
Utilizes the DALL·E tool to generate images based on text prompts.
These agents collect data from a PDF, supplement it with web search results, and enrich the final report with generated images, all while demonstrating streaming outputs in real time.
By the end of this tutorial, you will learn how to:
Integrate multiple agents (Web Searching, RAG, Image Generation) in a single LangChain pipeline.
Generate and update a Markdown report (report.md and report-final.md) using the agents’ outputs.
Observe and process streaming outputs using a custom generator and callback system.
Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorialfor more details.
You can alternatively set OPENAI_API_KEY in .env file and load it.
[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.
Implementing Multiple Agents
In the following sections, we set up a LangChain pipeline with three different agents:
Web Searching Agent to gather live data,
RAG (Retrieval-Augmented Generation) Agent to pull context from a PDF,
Image Generation Agent for creating a final image.
We will demonstrate how each agent can be combined to generate a streaming report with real-time parsing and callback-driven outputs.
Adding a Web Searching Agent
Below, we import a sample tool TavilySearchResults for performing web searches.
This will serve as our Web Searching Agent to gather real-time information
based on user queries.
To use the Tavily Search API, you need to obtain an API key.
You can obtain your API key by visiting the following link: Tavily Search API Registration.
You can set TAVILY_API_KEY in .env file and load it.
[Note] This is not necessary if you've already set TAVILY_API_KEY in previous steps.
Data Loading and Vector Store (RAG)
Next, we set up the RAG (Retrieval-Augmented Generation) Agent. Below, we load a PDF file (e.g., shsconf_icdeba2023_02022.pdf), split it into chunks, and create a VectorStore using FAISS. We then initialize a retriever to query those chunks.
Document Used for Practice
Tesla's Revenue Forecast Based on Business Model and Financial Statement Analysis
Author: Chenhao Fang Institution: Intelligent Accounting Management Institute, Guangdong University of Finance and Economics Link: Tesla's revenue forecast base on business model and financial statement analysis File Name: shsconf_icdeba2023_02022.pdf
Please copy the downloaded file to the data folder for practice.
Wrapping the Retriever as a Tool
We wrap our retriever in a LangChain tool so it can be invoked by an agent. Here, we define a prompt template to format the retrieved documents.
Adding a DALL·E Tool for Image Generation
Below, we set up the Image Generation Agent using DallEAPIWrapper.
This allows our pipeline to generate images based on text prompts
and integrate them into the final report.
File Management Tools
Next, we set up file management tools to enable the agent to write, read, and list files within a specified directory. This is used to store and update the report.md, report-final.md, and other files.
Combining Tools
We now combine all tools (Web Searching, RAG, DALL·E, File Management) into a single list.
Creating a Prompt and Agent
Here, we create a ChatPromptTemplate and a LangChain agent to handle
LLM calls and tool usage. We store each session’s chat history in a dictionary
to maintain context across multiple steps.
Synchronous Parsing
Below, we define callbacks to monitor and process the agent’s steps in real time. They capture tool calls, observations, and final results.
Using the Agent in a Streaming Workflow
Now, we will demonstrate how to use all three agents (Web Searching, RAG, and Image Generation) in a streaming fashion. Each step is processed by our custom parser, allowing us to see tool calls, observations, and final answers in real time.
Step 1: Summarize PDF Content and Save to report.md
First, we instruct the agent to summarize key aspects of the Tesla PDF and save the summary to report.md.
When you check the contents of the generated report file (report.md), it will display as follows.

Step 2: Perform Web Search and Append to report.md
Next, we perform a web search about Tesla's revenue outlook, append the findings to report.md, and then read the updated file content.
When you check the contents of the updated report file (report.md), it will display as follows.

Step 3: Create a Professional Report and Save to report-final.md
Then, we instruct the agent to create a more professional report based on report.md, add a table, and save it as report-final.md. Finally, we read and display the final report.
When you check the contents of the newly created report file (report-final.md), it will display as follows.

Step 4: Generate and Embed an Image into report-final.md
Finally, we generate an image symbolizing Tesla’s future using the Image Generation Agent, and prepend the image URL to report-final.md.
Finally, when you check a portion of the most recently generated report file (report-final.md), it will display as follows.

Last updated