Ollama Deep Researcher (Deepseek-R1)

Open in ColabOpen in GitHub

Overview

This tutorial explores how to build a fully local AI-powered research agent using Ollama and Deepseek-R1 , an open-source large language model. The research agent is designed based on Iterative Demonstration-Based Retrieval-Augmented Generation (IterDRAG) , a methodology that enhances complex query resolution through iterative query decomposition, retrieval, and synthesis . By leveraging this structured approach, we can enable AI to autonomously refine queries, retrieve relevant documents, and synthesize high-quality research outputs — all while running entirely on your local machine .

DISCLAIMER : This tutorial code is heavily based on Ollama Deep Researcher link.

Key Technologies

  • Ollama : A local runtime for efficiently running open-source LLMs.

  • Deepseek-R1 : A powerful open-source model optimized for reasoning and research.

  • IterDRAG (Iterative Demonstration-Based RAG) : A retrieval and generation method that improves AI-driven research by breaking down complex queries into manageable sub-queries, retrieving relevant context, and synthesizing iterative answers .


What You’ll Learn

🔹 How to set up Ollama & Deepseek-R1 for local AI research 🔹 How to optimize Deepseek-R1 models 🔹 How to implement an IterDRAG-based research workflow

By the end of this tutorial, you’ll be able to build a fully local, AI-enhanced research agent that applies IterDRAG principles to enable incremental knowledge refinement, retrieval-aware generation, and dynamic query optimization—all while maintaining full privacy, speed, and control over the research process. 🚀

Table of Contents

References


Environment Setup

Setting up your environment is the first step. See the Environment Setup guide for more details.

[Note]

The langchain-opentutorial is a package of easy-to-use environment setup guidance, useful functions and utilities for tutorials. Check out the langchain-opentutorial for more details.

You can set API keys in a .env file or set them manually.

[Note] If you’re not using the .env file, no worries! Just enter the keys directly in the cell below, and you’re good to go.

Getting Started with Ollama and DeepSeek-R1

Ollama allows us to run Deepseek-R1 (or other models) directly on a local machine , removing the need for cloud-based APIs.

  • The model can be accessed via: ✅ ollama command-line interfaceLangChain’s ChatOllama class , enabling structured AI workflows

  • Supports multiple output formats, including text, JSON, and multimodal outputs

By using Ollama to serve Deepseek-R1, we can execute this entire workflow locally , ensuring privacy, efficiency, and full control over the research process.


Step 1: Install Ollama

Ollama is available for macOS, Linux, and Windows. You can download and install it from the official website:

🔗 Download Ollama

Once installed, verify that Ollama is working by running the following command in your terminal:

If the installation was successful, this command should return the installed Ollama version.


Step 2: Download DeepSeek-R1 Models

After installing Ollama, you can download a DeepSeek-R1 model. These models vary in size, so it’s important to check your GPU memory before selecting one.

To pull a model, use:

This command will download the 8 billion parameter model (8B), which we tested on a MacBook Pro M1 (16GB RAM) and confirmed to be working properly.

💡 If you experience performance issues, consider using a smaller model, depending on your hardware.


Check GPU Memory Requirements

Ollama's DeepSeek-R1 models support Q4_K_M quantization, which reduces the required GPU memory by compressing the model to 4-bit precision.

You can use the table below to determine if your hardware can support a particular model. For most users, 4-bit precision (Q4_K_M) is the recommended setting.

📌 Calculation Formula: Refer to this blog post for details on how GPU memory is estimated.

Model Parameters
16-bit Precision
8-bit Precision
4-bit Precision

1.5 billion

~3.6 GB

~1.8 GB

~0.9 GB

7 billion

~16.8 GB

~8.4 GB

~4.2 GB

8 billion

~19.2 GB

~9.6 GB

~4.8 GB

14 billion

~33.6 GB

~16.8 GB

~8.4 GB

32 billion

~76.8 GB

~38.4 GB

~19.2 GB

70 billion

~168.0 GB

~84.0 GB

~42.0 GB

671 billion

~1610.4 GB

~805.2 GB

~402.6 GB

Using ChatOllama with DeepSeek-R1

In this section, we'll explore how to utilize the ChatOllama class with the deepseek-r1 model to generate web search queries in JSON format. Additionally, we'll delve into the use of <think> tags in deepseek-r1 to structure the model's reasoning process.

This tutorial does not cover the basics of Ollama and ChatOllama .

If you need more information, please refer to the following tutorial: "/04-MODEL/10-Ollama.ipynb"

Generating Web Search Queries in JSON Format

To generate structured search queries, we can prompt deepseek-r1 to output responses in JSON format. This ensures that the generated queries are well-structured and easy to parse in automated workflows.

When requesting search queries, you can specify the format explicitly, ensuring that the AI returns a properly formatted JSON object. This is useful for integrating AI-generated search queries into research pipelines.

Using DeepSeek-R1

This section provides an overview of how to use DeepSeek-R1 effectively and follow the recommended best practices outlined by its authors. We'll also introduce the <think> tags , explain their purpose, and mention a workaround for removing them from agent outputs.

In this tutorial, we will not cover DeepSeek's architecture or training methodology .

If you need more information, please refer to this blog.

To achieve optimal performance with DeepSeek-R1, the model's authors recommend the following configurations:

  • Set temperature between 0.5 and 0.7 (0.6 is recommended) to ensure coherent responses and prevent repetition.

  • Do not use system prompts. Instead, provide all instructions within the user prompt.

  • For mathematical problems, explicitly instruct the model to reason step by step and format the final answer using \boxed{} .

  • Encourage structured reasoning by ensuring the model starts its response with <think> , as it may sometimes omit its reasoning process.

Following these recommendations will help maintain consistency and accuracy when using DeepSeek-R1.


Below are code examples comparing cases without and with the recommended usage guidelines.

Not following the guidelines does not necessarily lead to poor results, but these examples are provided to illustrate the impact of using best practices.

What Are <think> Tags?

DeepSeek-R1 utilizes structured reasoning through <think> tags. These tags encapsulate the model’s internal thought process before delivering a final answer.

The <think> tags exist because of the way DeepSeek-R1 was trained:

  • Reinforcement Learning (RL) for Reasoning Tasks: During training, the model was rewarded for explicitly stating its thought process within <think> tags.


Handling <think> Tags in Outputs

While <think> tags enhance interpretability, there are cases where you may want to remove them for cleaner output in applications.

The following code is a hack to remove the <think> tags from the output.

Ollama Deep Researcher powered by IterDRAG

Why IterDRAG for Research?

Research queries often require multi-hop reasoning , meaning a single AI-generated answer is insufficient for complex topics. IterDRAG addresses this challenge by:

Decomposing complex research topics into structured sub-queries ✅ Iteratively retrieving, summarizing, and refining relevant documents ✅ Generating intermediate answers for each sub-query before final synthesis ✅ Scaling up knowledge extraction by incorporating multiple iterations of retrieval

This tutorial introduces a modular research agent designed to iteratively refine research queries and synthesize structured reports. The research process follows an IterDRAG-based methodology , ensuring thorough information retrieval and refinement through multiple iterations.


How the Ollama Deep Researcher Works

  1. Query Generation → The AI generates an initial web search query based on the research topic.

  2. Web Research → The agent fetches relevant documents using a search API (Tavily or Perplexity).

  3. Summarization → Retrieved documents are summarized into a structured format for further analysis.

  4. Reflection & Knowledge Gap Detection → The agent analyzes knowledge gaps and generates a follow-up query if needed.

  5. Iterative Research Loop → Steps 2–4 are repeated up to a set number of times to refine the research.

  6. Final Report Generation → Once the loop limit is reached, the agent compiles all gathered insights and sources into a structured report .


Configuration & State management

Before constructing the graph, let’s first define the configuration and state.

For the configuration,

  • Configuration manages customizable settings such as the number of research iterations ( max_web_research_loops ) and the choice of the local LLM model (e.g., deepseek-r1:8b ).

  • The from_runnable_config method allows dynamic configuration loading from environment variables or provided settings.

For state management,

  • SummaryState keeps track of the research process, including the topic, search queries, gathered sources, and the final summary.

  • It also defines SummaryStateInput and SummaryStateOutput to structure input and output data clearly.

Defining the Deep Researcher Nodes

Now, let’s build the graph.

First, we’ll define the five core nodes that make up the Ollama Deep Researcher . These nodes represent key stages of the research process. Once the nodes are defined, we’ll connect them with edges to form a structured graph workflow that guides the research execution step by step.

Here are the five core nodes:

  • generate_query : Generates an initial web search query based on the research topic.

  • web_research : Searches the web using the generated query and retrieves relevant information.

  • summarize_sources : Summarizes the gathered sources into a structured format.

  • reflect_on_summary : Identifies knowledge gaps and formulates a follow-up query if needed.

  • finalize_summary : Compiles all findings into a well-structured final research report.

graph

Generating the Search Query

  • The agent constructs an optimized web search query using ChatOllama in JSON format.

  • It ensures that the generated query is structured for efficient retrieval.

Node to conduct web search using Tavily

Conducting Web Research

  • The system fetches information using either Tavily or Perplexity search APIs.

  • The search results are formatted and deduplicated to ensure high-quality input.

Node to summarize the web search results

Summarizing the Findings

  • The AI extracts key insights from retrieved documents.

  • It removes redundant information and maintains coherence across iterations.

Node to generate a refined query reflecting a existing summary

Refining Through Reflection

  • The model analyzes knowledge gaps in the existing summary.

  • If necessary, it generates a refined follow-up query to gather missing information.

Node to finalize summary

Finalizing the Summary

  • The node compiles all research findings into a structured report.

  • Sources are formatted into a bulleted list for transparency .

Building the Deep Researcher Graph

Now we'll create the deep researcher graph that orchestrates the research workflow.

png

Running the Deep Researcher Graph

This code runs a research agent using the invoke_graph function.

It initializes the agent with the DeepSeek-R1 (8B) model, sets a research topic, and configures the agent to perform up to 3 web research iterations. The agent then executes the research workflow, gathering information, summarizing findings, and refining queries automatically. 🚀

[NOTE] This process takes approximately 3 to 5 minutes on an M1 Pro.

Display completed research summary in markdown

Summary

DeepSeek-R1 Model Summary

The DeepSeek-R1 model is an open-source large language model developed by the Chinese company DeepSeek, founded in 2023 by Liang Wenfang. Launched in January 2025, R1 has demonstrated remarkable reasoning capabilities and performance comparable to OpenAI's GPT-4 (o1) in AI reasoning tasks.

Architecture and Training Techniques

DeepSeek-R1 is equipped with 671 billion parameters, making it a powerful tool for generating accurate responses. The model employs techniques such as reinforcement learning and chain-of-thought reasoning to enhance its precision and effectiveness. These methods allow R1 to not only match but sometimes surpass the performance of OpenAI's o1 in areas like mathematics, coding, and logical reasoning.

Performance Metrics

In benchmarks, DeepSeek-R1 has shown comparable results to OpenAI o1 across various tasks. Its ability to handle complex problem-solving and generate coherent, contextually appropriate responses positions it as a strong competitor in the generative AI market. The model's performance is particularly notable in mathematical computations and coding challenges.

Cost Advantage

One of R1's significant strengths is its affordability compared to OpenAI's models. This cost-effectiveness makes R1 accessible to a broader range of applications, including education, research, and industry, where budget constraints are often a limiting factor.

Impact on AI Market

The release of R1 underscores the potential of open-source AI models to compete with and even surpass proprietary alternatives like OpenAI's products. This approach not only democratizes access to advanced AI technologies but also challenges traditional monopolies in the AI sector, fostering innovation and competition.

In conclusion, DeepSeek-R1 represents a significant milestone in AI development, offering both powerful performance and accessible solutions. Its impact on the market is expected to be profound, influencing future advancements in generative AI.

Sources:

  • DeepSeek R-1 Model Overview and How it Ranks Against OpenAI's o1 : https://www.prompthub.us/blog/deepseek-r-1-model-overview-and-how-it-ranks-against-openais-o1

  • The Mathematics Behind DeepSeek-R1 | by Harjot Kaur | Jan, 2025 ... : https://pub.towardsai.net/the-mathematics-behind-deepseek-r1-954102f9b9c6

  • DeepSeek-R1: Features, Use Cases, and Comparison with OpenAI : https://www.mygreatlearning.com/blog/deepseek-r1-features-use-cases/

  • Can DeepSeek R1 Take On OpenAI o1? Benchmarks Say Yes : https://www.techopedia.com/can-deepseek-r1-take-on-openai-o1

Last updated