10-LangGraph-Research-Assistant

LangGraph : Research Assistant with STORM

Open in ColabOpen in GitHub

Overview

Research is often a labor-intensive task delegated to analysts, but AI holds tremendous potential to revolutionize this process. This tutorial explores how to construct a customized AI-powered research and report generation workflow using LangGraph, incorporating key concepts from Stanford's STORM framework.

Why This Approach?

The STORM methodology has demonstrated significant improvements in research quality through two key innovations:

  • Outline creation through querying similar topics enhances coverage.

  • Multi-perspective conversation simulation increases reference usage and information density.

The translation is accurate but can be enhanced for better clarity and technical precision. Here's the reviewed version:

Key Components

Core Themes

  • Memory: State management and persistence across interactions.

  • Human-in-the-loop: Interactive feedback and validation mechanisms.

  • Controllability: Fine-grained control over agent workflows.

Research Framework

  • Research Automation Objective: Building customized research processes tailored to user requirements.

  • Source Management: Strategic selection and integration of research input sources.

  • Planning Framework: Topic definition and AI analyst team assembly.

Process Implementation

Execution Flow

  • LLM Integration: Conducting comprehensive expert AI interviews.

  • Parallel Processing: Simultaneous information gathering and interview execution.

  • Output Synthesis: Integration of research findings into comprehensive reports.

Technical Implementation

  • Environment Setup: Configuration of runtime environment and API authentication.

  • Analyst Development: Human-supervised analyst creation and validation process.

  • Interview Management: Systematic question generation and response collection.

  • Parallel Processing: Implementation of Map-Reduce for interview parallelization.

  • Report Generation: Structured composition of introductory and concluding sections.

AI has significant potential to support these research processes. However, research requires customization. Raw LLM outputs are often not suitable for real decision-making workflows.

A customized AI-based research and report generation workflow is a promising solution to address this issue.

10-LangGraph-Research-Assitant-concept

Table of Contents

References


Environment Setup

Setting up your environment is the first step. See the Environment Setup guide for more details.

[Note]

The langchain-opentutorial is a package of easy-to-use environment setup guidance, useful functions and utilities for tutorials. Check out the langchain-opentutorial for more details.

You can set API keys in a .env file or set them manually.

[Note] If you’re not using the .env file, no worries! Just enter the keys directly in the cell below, and you’re good to go.

Utilities

These are brief descriptions of the modules from langchain-opentutorial used for practice.

visualize_graph

for visualizing graph structure

random_uuid , invoke_graph

  • random_uuid : for generating a random UUID (Universally Unique Identifier) and returns it as a string.

  • invoke_graph : for streaming and displays the results of executing a CompiledStateGraph instance in a visually appealing format.

TabilySearch

This code defines a tool for performing search queries using the Tavily Search API. It includes input validation, formatting of search results, and the ability to customize search parameters.Methods :

  • __init__ : Initializes the TavilySearch instance, setting up the API client and input parameters.

  • _run : Implements the base tool's run method, calling the search method and returning results.

  • search : Performs the actual search using the Tavily API, taking various optional parameters to customize the query. It formats the output based on user preferences.

  • get_search_context : Retrieves relevant context based on a search query, returning a JSON string that includes search results formatted as specified.

Analysts Generation : Human in the Loop

Analyst Generation : Create and review analysts using Human-In-The-Loop.

The following defines the state that tracks the collection of analysts generated through the Analyst class:

Defining the Analyst Generation Node

Next, we will define the analyst generation node.

The code below implements the logic for generating various analysts based on the provided research topic. Each analyst has a unique role and affiliation, offering professional perspectives on the topic.

Explanation of Code Components

  • Analyst Instructions: A prompt that guides the LLM in creating AI analyst personas based on a specified research topic and any provided feedback.

  • create_analysts Function: This function is responsible for generating a set of analysts based on the current state, which includes the research topic, maximum number of analysts, and any human feedback.

  • human_feedback Function: A placeholder function that can be expanded to handle user feedback.

  • should_continue Function: This function evaluates whether there is human feedback available and determines whether to proceed with creating analysts or end the process.

This structure allows for a dynamic approach to generating tailored analyst personas that can provide diverse insights into a given research topic.

Building the Analyst Generation Graph

Now we'll create the analyst generation graph that orchestrates the research workflow.

png

Graph Components

  • Nodes

    • create_analysts: Generates analyst personas based on the research topic

    • human_feedback: Checkpoint for receiving user input and feedback

  • Edges

    • Initial flow from START to analyst creation

    • Connection from analyst creation to human feedback

    • Conditional path back to analyst creation based on feedback

  • Features

    • Memory persistence using MemorySaver

    • Breakpoints before human feedback collection

    • Visual representation of workflow through visualize_graph

This graph structure enables an iterative research process with human oversight and feedback integration.

Running the Analyst Generation Graph

Here's how to execute and manage the analyst generation workflow:

When __interrupt__ is displayed, the system is ready to receive human feedback. At this point, you can retrieve the current state and provide feedback to guide the analyst generation process.

To inject human feedback into the graph, we use the update_state method with the following key components:

Key Parameters

  • config : Configuration object containing graph settings

  • human_analyst_feedback : Key for storing feedback content

  • as_node : Specifies the node that will process the feedback

[Note] : Assigning None as input triggers the graph to continue its execution from the last checkpoint. This is particularly useful when you want to resume processing after providing human feedback.

(Continue) To resume the graph execution after the __interrupt__ :

When __interrupt__ appears again, you have two options:

  • Option 1: Provide Additional Feedback

    • You can provide more feedback to further refine the analyst personas using the same method as before

  • Option 2: Complete the Process

To finish the analyst generation process without additional feedback:

Displaying Final Results

Get and display the final results from the graph:

Key Components

  • final_state: Contains the final state of the graph execution.

  • analysts: List of generated analyst personas.

  • final_state.next: Empty tuple indicating workflow completion.

The output will display each analyst's complete persona information, including their name, role, affiliation, and description, followed by a separator line. The empty tuple printed at the end confirms that the graph execution has completed successfully.

Interview Execution

Define Classes and question_generation Node

Let's implement the interview execution components with proper state management and question_generation Node:

State Management

  • InterviewState tracks conversation turns, context, and interview content.

  • Annotated context list allows for document accumulation.

  • Maintains analyst persona and report sections.

Question Generation

  • Structured system prompt for consistent interviewing style.

  • Persona-aware questioning based on analyst goals.

  • Progressive refinement of topic understanding.

  • Clear interview conclusion mechanism.

The code provides a robust foundation for conducting structured interviews while maintaining conversation state and context.

Defining Research Tools

Experts collect information in parallel from multiple sources to answer questions.

They can utilize various tools such as web document scraping, VectorDB, web search, and Wikipedia search.

We'll focus on two main tools: Tavily for web search and ArxivRetriever for academic papers.

Tavily Search

  • Real-time web search capabilities

  • Configurable result count and content depth

  • Structured output formatting

  • Raw content inclusion option

ArxivRetriever

  • Access to academic papers and research

  • Full document retrieval

  • Comprehensive metadata access

  • Customizable document load limits

format and display Arxiv search results in a structured XML-like format:

Defining Search Tool Nodes

The code implements two main search tool nodes for gathering research information: web search via Tavily and academic paper search via ArXiv. Here's a breakdown of the key components:

Key Features

  • Query Generation: Uses LLM to create structured search queries from conversation context

  • Error Handling: Robust error management for ArXiv searches

  • Result Formatting: Consistent XML-style formatting for both web and academic results

  • Metadata Integration: Comprehensive metadata inclusion for academic papers

  • State Management: Maintains conversation context through InterviewState

Define generate_answer, save_interview, route_messages, write_section Nodes

  • The generate_answer node is responsible for creating expert responses during the interview process.

  • save_interview

Define generate_answer, save_interview, route_messages, write_section Nodes

  • route_messages

  • The write_section function and its associated instructions implement a structured report generation system.

Building the Interview Graph

Here's how to create and configure the interview execution graph:

png

Graph Structure

The interview process follows this flow:

  1. Question Generation

  2. Parallel Search (Web and ArXiv)

  3. Answer Generation

  4. Conditional Routing

  5. Interview Saving

  6. Section Writing

Key Components

  • State Management: Uses InterviewState for tracking

  • Memory Persistence: Implements MemorySaver

  • Conditional Logic: Routes between questions and interview completion

  • Parallel Processing: Conducts simultaneous web and academic searches

Note: Ensure the langgraph module is installed before running this code.

Executing the Interview Graph

Here's how to execute the graph and display the results:

Display completed interview section in markdown

Exploring the Architectural Advancements in Modular and Naive RAG

Summary

Retrieval-Augmented Generation (RAG) is a promising technique that enhances the capabilities of large language models (LLMs) by incorporating external data through retrieval mechanisms. This report focuses on the architectural distinctions between Modular RAG and Naive RAG, highlighting their respective computational efficiencies and adaptability in various AI applications. The evolution from Naive RAG to Modular RAG marks a significant shift in how LLMs handle knowledge-intensive tasks, prompting a need for a deeper understanding of their optimization for performance and scalability.

Naive RAG, foundational in its approach, integrates information retrieval with language generation to produce contextually relevant responses. However, it often struggles with inflexibility and inefficiencies when dealing with diverse datasets, primarily due to its reliance on straightforward similarity calculations for retrieval. In contrast, Modular RAG introduces a reconfigurable framework by decomposing complex RAG systems into independent modules, facilitating more sophisticated routing, scheduling, and fusion mechanisms. This modularity allows for more precise control and customization, making Modular RAG systems more adaptable to specific application needs.

The novelty of insights gathered from recent literature reveals a trend towards modular frameworks to overcome the limitations of traditional RAG systems. Notably, the theory of token-level harmonization in RAG presents a novel approach to balancing the benefits and detriments of retrieval, offering a theoretical foundation that could enhance the precision of LLM responses.

Key source documents include:

  1. [1] http://arxiv.org/abs/2406.00944v2

  2. [2] http://arxiv.org/abs/2407.21059v1

  3. [3] https://www.superteams.ai/blog/how-to-implement-naive-rag-advanced-rag-and-modular-rag

  4. [4] https://adasci.org/how-does-modular-rag-improve-upon-naive-rag/

  5. [5] https://zilliz.com/blog/advancing-llms-native-advanced-modular-rag-approaches

Comprehensive Analysis

The evolution of Retrieval-Augmented Generation (RAG) systems has led to significant advancements in how large language models (LLMs) integrate external information to enhance their generative capabilities. This section delves into the architectural and operational differences between Naive RAG and Modular RAG, emphasizing their impact on computational efficiency and adaptability.

Naive RAG: Foundations and Limitations

Naive RAG represents the initial phase of RAG systems, primarily characterized by its "retrieve-then-generate" approach. This model combines document retrieval with language model generation, aiming to produce coherent and contextually relevant responses. However, several challenges undermine its effectiveness:

  • Shallow Understanding of Queries: Naive RAG relies heavily on semantic similarity for retrieval, which can result in inadequate exploration of the query-document relationship. This limitation often leads to a failure in capturing nuanced query intents, affecting the accuracy of the generated responses [2].

  • Retrieval Redundancy and Noise: The process of feeding all retrieved chunks into LLMs can introduce excessive noise, potentially misleading the model and increasing the risk of generating hallucinated responses. This redundancy highlights the need for more refined retrieval mechanisms [5].

  • Inflexibility: The rigid architecture of Naive RAG restricts its adaptability to diverse and dynamic datasets, making it less suitable for specialized tasks or industries [4].

Modular RAG: A Reconfigurable Framework

Modular RAG addresses these limitations by introducing a highly reconfigurable framework that decomposes RAG systems into independent modules and specialized operators. This modular approach offers several advantages:

  • Advanced Design: By integrating routing, scheduling, and fusion mechanisms, Modular RAG transcends the traditional linear architecture, allowing for more sophisticated data handling and processing [2].

  • Customization and Scalability: The modularity of the system enables organizations to tailor RAG systems to specific application needs, enhancing relevance, response times, and customer satisfaction. This adaptability is crucial for scaling RAG systems across various domains [4].

  • Enhanced Retrieval and Generation: The modular framework supports the inclusion of advanced retrievers and complementary technologies, improving the overall performance of RAG systems in handling complex queries and variable data [3].

Token-Level Harmonization Theory

A significant theoretical advancement in RAG systems is the introduction of a token-level harmonization theory. This approach models RAG as a fusion between the distribution of LLM knowledge and retrieved texts. It formalizes the trade-off between the value of external knowledge (benefit) and its potential to mislead LLMs (detriment) in next token prediction. This theoretical framework allows for a more explainable and quantifiable comparison of benefits and detriments, facilitating a balanced integration of external data [1].

Implications for AI Applications

The transition from Naive to Modular RAG represents a paradigm shift in how LLMs are utilized in AI applications. Modular RAG's adaptability and efficiency make it a preferable choice for tasks requiring high scalability and specialization. The ongoing research into token-level harmonization further enhances the precision and reliability of RAG systems, paving the way for more robust and contextually aware AI solutions.

Sources

[1] http://arxiv.org/abs/2406.00944v2 [2] http://arxiv.org/abs/2407.21059v1 [3] https://www.superteams.ai/blog/how-to-implement-naive-rag-advanced-rag-and-modular-rag [4] https://adasci.org/how-does-modular-rag-improve-upon-naive-rag/ [5] https://zilliz.com/blog/advancing-llms-native-advanced-modular-rag-approaches

Parallel Interviewing by map-reduce

Here's how to implement parallel interviews using map-reduce in LangGraph:

Let me explain how the Send() function is used in LangGraph for parallel interview execution:

Reference

Report Writing

Next, we will define the guidelines for writing a report based on the interview content and define a function for report writing.

Define Nodes

  • Main Report Content

  • Introduction Generation

  • Final Report Assembly

Each function handles a specific aspect of report generation:

  • Content synthesis from interview sections

  • Introduction creation

  • Conclusion development

  • Final assembly with proper formatting and structure

Building the Report Writing Graph

Here's the implementation of the research graph that orchestrates the entire workflow:

png

The graph structure implements:

Core Workflow Stages

  • Analyst Creation

  • Human Feedback Integration

  • Parallel Interview Execution

  • Report Generation

  • Final Assembly

Key Components

  • State Management using ResearchGraphState

  • Memory persistence with MemorySaver

  • Conditional routing based on human feedback

  • Parallel processing of interviews

  • Synchronized report assembly

Flow Control

  • Starts with analyst creation

  • Allows for human feedback and iteration

  • Conducts parallel interviews

  • Generates report components simultaneously

  • Assembles final report with all components

This implementation creates a robust workflow for automated research with human oversight and parallel processing capabilities.

Executing the Report Writing Graph

Here's how to run the graph with the specified parameters:

Let's add human_feedback to customize the analyst team and continue the graph execution:

Let's complete the human feedback phase and resume the graph execution:

Here's how to display the final research report:

Modular RAG: A Paradigm Shift in AI System Design

Introduction

In an era where the demand for scalable, efficient, and adaptable artificial intelligence (AI) systems continues to rise, Modular Retrieval-Augmented Generation (RAG) systems are emerging as a transformative solution. The landscape of AI, particularly with the evolution of large language models (LLMs), is rapidly advancing, necessitating novel approaches to overcome the limitations of traditional systems. This report delves into the significant advancements brought by Modular RAG, contrasting it with Naive RAG systems and exploring its benefits at the production level.

We begin by examining the foundational role of Naive RAG systems, which integrate document retrieval with language generation but often falter due to their inflexibility and inefficiency in handling diverse datasets. Advanced RAG techniques address these challenges by leveraging dynamic embeddings and vector databases, enhancing semantic understanding and retrieval accuracy.

The heart of our analysis lies in the introduction of Modular RAG frameworks, which decompose complex RAG systems into independent modules. This modularity allows for reconfigurable designs, improving scalability and facilitating the integration of advanced technologies such as routing and scheduling mechanisms. We also explore the FlashRAG toolkit, a modular resource that standardizes RAG research, and the integration of graph-based systems to enhance knowledge-based tasks.

Lastly, we discuss the practical implications and challenges of deploying Modular RAG at scale, highlighting its potential to revolutionize AI applications across various domains. Through this exploration, we aim to illustrate how Modular RAG aligns with the vision of sustainable and adaptable AI systems, paving the way for future innovations.


Main Idea

Background

The emergence of Retrieval-Augmented Generation (RAG) systems has significantly influenced the development of AI, particularly in enhancing the capabilities of large language models (LLMs) for handling complex, knowledge-intensive tasks. Initially, RAG systems were exemplified by Naive RAG, which combined document retrieval with language generation to provide contextually relevant responses. However, these systems often struggled with dynamic datasets and specific semantic requirements, leading to inefficiencies and challenges in scalability [1][2]. The evolution of RAG into Modular RAG represents a groundbreaking shift, introducing a framework that decomposes RAG systems into flexible and independent modules. This modularity is key to enhancing scalability and adaptability, aligning with the vision of sustainable AI systems that can efficiently evolve in production environments [3][4]. The development of toolkits like FlashRAG further supports this evolution by offering a standardized platform for implementing and comparing RAG methods, thereby facilitating research and practical deployment [5].

Prior studies laid the groundwork for RAG systems, with Naive RAG establishing the foundational integration of retrieval and generation. These early models demonstrated the potential of AI systems to draw on external information sources, yet they faced notable limitations, including inflexibility and inefficiency in processing diverse datasets [1]. Advanced RAG models addressed some of these challenges by employing dynamic embedding techniques and vector databases, which improved the semantic understanding and adaptability of the systems [2]. The introduction of graph-based RAG systems further enhanced these capabilities by leveraging graph structures for more accurate information retrieval and generation [5]. The transition to Modular RAG builds on these advancements by offering a reconfigurable framework that supports various RAG patterns, such as linear, conditional, branching, and looping, each with its specific implementation nuances [4].

Problem Definition

The primary challenge addressed by this research is the limitations of traditional Naive RAG systems in adapting to diverse and dynamic datasets. Specifically, these systems struggle with inflexibility, inefficiency, and a lack of semantic relevance, which impact their scalability and effectiveness in real-world applications [1][2][3]. The research aims to explore how Modular RAG can overcome these limitations by providing a flexible and scalable architecture that supports the evolving demands of AI systems. This involves developing modular frameworks that decompose complex RAG systems into independent modules, allowing for more efficient design and maintenance [4][5]. Additionally, the research seeks to address practical challenges related to data management and integration, ensuring that Modular RAG systems can be effectively deployed at the production level [6][7].

Methodology

Modular RAG systems are designed by breaking down traditional RAG processes into independent modules and specialized operators, facilitating greater flexibility and adaptability. This approach enables the integration of advanced design features such as routing, scheduling, and fusion mechanisms, which are crucial for handling complex application scenarios [3][4]. The methodology involves implementing various RAG patterns—linear, conditional, branching, and looping—each with its specific nuances, allowing for tailored solutions to specific task requirements [4][5]. The comprehensive framework provided by toolkits like FlashRAG supports the reproduction of existing RAG methods and the development of new algorithms, ensuring consistency and facilitating comparative studies among researchers [5]. Additionally, theoretical frameworks are explored to understand the trade-offs between the benefits and detriments of retrieved texts, offering a structured approach to optimizing RAG performance [4].

Implementation Details

The practical implementation of Modular RAG involves utilizing software frameworks and computational resources that support modular architecture. FlashRAG, a modular toolkit, provides a standardized platform for implementing and comparing RAG methods, offering a customizable environment for researchers to develop and test RAG algorithms [5]. The toolkit includes 12 advanced RAG methods and 32 benchmark datasets, enabling researchers to optimize their RAG processes and ensuring consistency in evaluations [3][5]. Additionally, the use of vector databases and dynamic embedding techniques enhances the retrieval and generation processes, making them more context-aware and accurate [2]. The modular architecture also supports end-to-end training across its components, marking a significant advancement over traditional RAG systems [3].

Experiments

Experimental protocols for Modular RAG systems involve evaluating their performance across various application scenarios and datasets. The use of comprehensive toolkits like FlashRAG facilitates the implementation of standardized evaluation metrics and procedures, ensuring consistency in testing and comparison [5]. Experiments focus on assessing the scalability, adaptability, and efficiency of Modular RAG systems, particularly in handling diverse and dynamic datasets. Evaluation metrics include measures of retrieval accuracy, response coherence, and system scalability. Additionally, experiments explore the integration of fair ranking mechanisms to ensure equitable exposure of relevant items, thereby promoting fairness and reducing potential biases in RAG systems [3]. Theoretical studies further support experimental findings by modeling the trade-offs between the benefits and detriments of retrieved texts, offering insights into optimizing RAG performance [4].

Results

The results of implementing Modular RAG systems demonstrate significant improvements in flexibility, scalability, and efficiency compared to traditional Naive RAG models. Modular RAG's reconfigurable design allows for seamless integration and optimization of independent modules, resulting in enhanced adaptability to diverse application scenarios [4][5]. The use of dynamic embeddings and vector databases further improves retrieval accuracy and context-awareness, making Modular RAG systems more effective for knowledge-intensive applications [2]. The incorporation of graph-based approaches enhances real-time data integration and contextual understanding, addressing limitations related to static knowledge bases [5]. Moreover, the integration of fair ranking mechanisms ensures equitable exposure of relevant items, promoting fairness and reducing biases [3]. Overall, Modular RAG presents a compelling evolution in AI system design, offering a more adaptable and efficient framework for deploying AI solutions at scale.

Sources

[1] https://www.superteams.ai/blog/how-to-implement-naive-rag-advanced-rag-and-modular-rag [2] https://zilliz.com/blog/advancing-llms-native-advanced-modular-rag-approaches [3] http://arxiv.org/abs/2407.21059v1 [4] http://arxiv.org/abs/2406.00944v2 [5] http://arxiv.org/abs/2405.13576v1 [6] https://medium.com/aingineer/a-comprehensive-guide-to-implementing-modular-rag-for-scalable-ai-systems-3fb47c46dc8e [7] https://medium.com/@sahin.samia/modular-rag-using-llms-what-is-it-and-how-does-it-work-d482ebb3d372


Conclusion

The evolution from Naive to Modular Retrieval-Augmented Generation (RAG) systems represents a significant leap forward in the scalability and adaptability of artificial intelligence. Naive RAG laid the initial groundwork by integrating document retrieval with language generation, but its limitations in handling dynamic datasets and inflexibility necessitated advancements. Advanced RAG techniques addressed these issues by employing dynamic embeddings and vector databases, enhancing semantic understanding and adaptability.

Modular RAG frameworks mark a transformative advancement by decomposing complex systems into independent modules, allowing for a reconfigurable and scalable architecture. This modularity not only supports diverse RAG patterns, such as linear, conditional, branching, and looping, but also facilitates the integration of advanced technologies like routing and scheduling mechanisms. Toolkits like FlashRAG further standardize RAG research, offering a comprehensive environment for algorithm development and comparison.

Graph-based RAG systems and theoretical insights into benefit-detriment trade-offs provide additional layers of sophistication, improving real-time data integration and contextual understanding. However, challenges persist, particularly in data management and ensuring fair ranking.

Overall, Modular RAG aligns with the vision of creating sustainable and adaptable AI systems, promising significant benefits in production environments. Continued research and innovation in this field are essential to fully realizing its potential and overcoming existing challenges.

Last updated