GPT4ALL
Author: Yoonji Oh
Peer Review : Joseph, Normalist-K
Proofread : frimer
This is a part of LangChain Open Tutorial
Overview
In this tutorial, we’re exploring GPT4ALL together! From picking the perfect model for your hardware to running it on your own, we’ll walk you through the process step by step.
Ready? Let’s dive in and have some fun along the way!
Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorialfor more details.
You can also create and use a .env file in the root directory as shown below.
Installation
Ready to get started with gpt4all? Let’s make sure you’ve got everything set up! We’ll guide you through installing the package using pip or poetry. Don’t worry, it’s easy and quick.
Install the Python Package
You can install gpt4all using pip or poetry, depending on your preferred package manager. Here’s how:
1. Installation using pip
If you’re using pip, run the following command in your terminal:
2. Installation using poetry
Prefer poetry? No problem! Here’s how to install gpt4all using poetry:
Step 1: Add gpt4all to your project
Run this command to add the package to your pyproject.toml:
Step 2: Install dependencies If the package is already added but not installed, simply run:
Poetry will sync your environment and install all required dependencies.
What is GPT4ALL
GitHub:nomic-ai/gpt4all is an open-source chatbot ecosystem trained on a large amount of data, including code and chat-form conversations.
In this example, we will explain how to interact with the GPT4All model using LangChain.

Choosing a Model
It's the most crucial and decision-making time. Before diving into writing code, it's time to decide which model to use. Below, we explore popular models and help you choose the right one based on GPT4All's Python Documentation.
Model Selection Criteria
Model Name
Filesize
RAM Required
Parameters
Quantization
Developer
License
MD5 Sum (Unique Hash)
Meta-Llama-3-8B-Instruct.Q4_0.gguf
4.66 GB
8 GB
8 Billion
q4_0
Meta
Llama 3 License
c87ad09e1e4c8f9c35a5fcef52b6f1c9
Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf
4.11 GB
8 GB
7 Billion
q4_0
Mistral & Nous Research
Apache 2.0
Coa5f6b4eabd3992da4d7fb7f020f921eb
Phi-3-mini-4k-instruct.Q4_0.gguf
2.18 GB
4 GB
3.8 Billion
q4_0
Microsoft
MIT
f8347badde9bfc2efbe89124d78ddaf5
orca-mini-3b-gguf2-q4_0.gguf
1.98 GB
4 GB
3 Billion
q4_0
Microsoft
CC-BY-NC-SA-4.0
0e769317b90ac30d6e09486d61fefa26
gpt4all-13b-snoozy-q4_0.gguf
7.37 GB
16 GB
13 Billion
q4_0
Nomic AI
GPL
40388eb2f8d16bb5d08c96fdfaac6b2c
Based on Use Case
Choose your model depending on the tasks you plan to perform:
Lightweight tasks (e.g., simple conversation):
orca-mini-3b-gguf2-q4_0.gguforPhi-3-mini-4k-instruct.Q4_0.gguf.Moderate tasks (e.g., summarization or grammar correction):
Meta-Llama-3-8B-Instruct.Q4_0.gguforNous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf.Advanced tasks (e.g., long text generation, research):
gpt4all-13b-snoozy-q4_0.gguf.
Based on System Specifications
Select a model based on your available hardware:
For 4GB RAM or less, use
orca-mini-3b-gguf2-q4_0.gguforPhi-3-mini-4k-instruct.Q4_0.gguf.For 8GB RAM or more, use
Meta-Llama-3-8B-Instruct.Q4_0.gguforNous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf.For 16GB RAM or more, use
gpt4all-13b-snoozy-q4_0.gguf.
[NOTE]
GGML: CPU-friendly and low memory usage.GGUF: Latest format with GPU acceleration support.q4_0 Quantization: Efficient for both CPU and GPU workloads, with reduced memory requirements.
Downloading a Model
In this tutorial, we will be using Microsoft's Phi-3-Mini-4K-Instruct model.
Download the Model: Visit HuggingFace to download the required model (2.39 GB).

Load Models in Python: After downloading the model, create a folder named
modelsand place the downloaded file in that folder.

Assign the local file path (e.g.,
Phi-3-mini-4k-instruct-q4.gguf) to thelocal_pathvariable.
You can replace this path with any local file path you prefer.
Use the Python Documentation to load and run your selected model in your project.
Running GPT4ALL Models
GPT4All is a powerful large-scale language model, similar to GPT-3, designed to support a variety of natural language processing tasks.
This module allows you to easily load the GPT4All model and perform inference seamlessly.
In the following example, we demonstrate how to load the GPT4All model and utilize it to answer a question by leveraging a custom prompt and inference pipeline.
[NOTE]
Due to structural changes, in version 0.3.13, you need to replace from langchain.prompts import ChatPromptTemplate with from langchain_core.prompts import ChatPromptTemplate.
Creating a Prompt and Checking the Result
The ChatPromptTemplate is responsible for creating prompt templates in LangChain and dynamically substituting variables. Without using the invoke() method, you can utilize the class's template methods to generate prompts. In this case, the template can be returned as a string using the format method.# Using format() instead of invoke()result = prompt.format(question="What is the capital of United States?")In a nutshell, the invoke() method is great for chain-based tasks, while the format() method is perfect for returning simple strings.result = prompt.format(question="where is the capital of United States?")print(result)Human: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. Human: where is the capital of United States? Assistant: You might notice that Human: is automatically added to the output. If you'd like to avoid this behavior, you can use LangChain's PromptTemplate class instead of ChatPromptTemplate. The PromptTemplate class doesn’t add any extra prefixes like this.from langchain_core.prompts.prompt import PromptTemplateprompt = PromptTemplate.from_template(template)formatted_prompt = prompt.format(question="Where is the capital of the United States?")print(formatted_prompt) A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. Human: Where is the capital of the United States? Assistant: We'll be using invoke for chain-based tasks, so go ahead and forget about the format method for now! 😉Using Chains to Display Results in Real-Time# Promptprompt = ChatPromptTemplate.from_template( """ <s>A chat between a user and an AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.</s> <s>Human: {question}</s> <s>Assistant:""")# GPT4All Language Model Initialization# Specify the path to the GPT4All model file in modelmodel = GPT4All( model=local_path, n_threads=8, # Number of threads to use. backend="gpu", # GPU Configuration streaming=True, # Streaming Configuration callbacks=[StreamingStdOutCallbackHandler()] # Callback Configuration)# Create the chainchain = prompt | model | StrOutputParser()# Execute the queryresponse = chain.invoke({"question": "where is the capital of United States?"})=== The capital of the United States is Washington, D.C., which stands for District of Columbia. It was established by the Constitution along with a federal district that would serve as the nation's seat of government and be under its exclusive jurisdiction. The city itself lies on the east bank of the Potomac River near its fall point where it empties into Chesapeake Bay, but Washington is not part of any U.S. state; instead, it has a special federal district status as outlined in Article I, Section 8 of the Constitution and further defined by the Residence Act of 1790 signed by President George Washington. Washington D.C.'s location was chosen to be near the nation's capital at that time—Philadelphia, Pennsylvania—and it also holds symbolic significance as a neutral ground for both northern and southern states during their early years in America. The city is home to many iconic landmarks such as the U.S. Capitol Building where Congress meets, the White House (the residence of the President), Supreme Court buildings, numerous museums like the Smithsonian Institution's National Museum of American History or Natural History and Air & Space, among othersSummaryToday, we explored GPT4ALL together! We didn’t just run models — we took part in the decision-making process, from selecting a model to suit our needs to choosing the right methods based on our desired outcomes or execution direction. Along the way, we compared the performance of popular models and even ran the code ourselves.Next time, we’ll dive into Video Q&A LLM (Gemini). Until then, try running today’s code with different models and see how they perform. See you soon! 😊
Last updated