GPT4ALL
Last updated
Last updated
Author: Yoonji Oh
Design:
Peer Review : Joseph, Normalist-K
This is a part of LangChain Open Tutorial
In this tutorial, we’re exploring GPT4ALL
together! From picking the perfect model for your hardware to running it on your own, we’ll walk you through the process step by step.
Ready? Let’s dive in and have some fun along the way!
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorial
is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.
You can checkout the langchain-opentutorial
for more details.
You can also create and use a .env
file in the root directory as shown below.
Ready to get started with gpt4all
? Let’s make sure you’ve got everything set up! We’ll guide you through installing the package using pip
or poetry
. Don’t worry, it’s easy and quick.
You can install gpt4all
using pip or poetry, depending on your preferred package manager. Here’s how:
If you’re using pip
, run the following command in your terminal:
Prefer poetry
? No problem! Here’s how to install gpt4all
using poetry:
Step 1: Add gpt4all
to your project
Run this command to add the package to your pyproject.toml
:
Step 2: Install dependencies If the package is already added but not installed, simply run:
Poetry will sync your environment and install all required dependencies.
GitHub:nomic-ai/gpt4all
is an open-source chatbot ecosystem trained on a large amount of data, including code and chat-form conversations.
In this example, we will explain how to interact with the GPT4All model using LangChain.
It's the most crucial and decision-making time. Before diving into writing code, it's time to decide which model to use. Below, we explore popular models and help you choose the right one based on GPT4All's Python Documentation.
Model Name
Filesize
RAM Required
Parameters
Quantization
Developer
License
MD5 Sum (Unique Hash)
Meta-Llama-3-8B-Instruct.Q4_0.gguf
4.66 GB
8 GB
8 Billion
q4_0
Meta
Llama 3 License
c87ad09e1e4c8f9c35a5fcef52b6f1c9
Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf
4.11 GB
8 GB
7 Billion
q4_0
Mistral & Nous Research
Apache 2.0
Coa5f6b4eabd3992da4d7fb7f020f921eb
Phi-3-mini-4k-instruct.Q4_0.gguf
2.18 GB
4 GB
3.8 Billion
q4_0
Microsoft
MIT
f8347badde9bfc2efbe89124d78ddaf5
orca-mini-3b-gguf2-q4_0.gguf
1.98 GB
4 GB
3 Billion
q4_0
Microsoft
CC-BY-NC-SA-4.0
0e769317b90ac30d6e09486d61fefa26
gpt4all-13b-snoozy-q4_0.gguf
7.37 GB
16 GB
13 Billion
q4_0
Nomic AI
GPL
40388eb2f8d16bb5d08c96fdfaac6b2c
Choose your model depending on the tasks you plan to perform:
Lightweight tasks (e.g., simple conversation): orca-mini-3b-gguf2-q4_0.gguf
or Phi-3-mini-4k-instruct.Q4_0.gguf
.
Moderate tasks (e.g., summarization or grammar correction): Meta-Llama-3-8B-Instruct.Q4_0.gguf
or Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf
.
Advanced tasks (e.g., long text generation, research): gpt4all-13b-snoozy-q4_0.gguf
.
Select a model based on your available hardware:
For 4GB RAM or less, use orca-mini-3b-gguf2-q4_0.gguf
or Phi-3-mini-4k-instruct.Q4_0.gguf
.
For 8GB RAM or more, use Meta-Llama-3-8B-Instruct.Q4_0.gguf
or Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf
.
For 16GB RAM or more, use gpt4all-13b-snoozy-q4_0.gguf
.
[NOTE]
GGML
: CPU-friendly and low memory usage.
GGUF
: Latest format with GPU acceleration support.
q4_0 Quantization
: Efficient for both CPU and GPU workloads, with reduced memory requirements.
In this tutorial, we will be using Microsoft's Phi-3-Mini-4K-Instruct
model.
Download the Model: Visit HuggingFace to download the required model (2.39 GB).
Load Models in Python: After downloading the model, create a folder named models
and place the downloaded file in that folder.
Assign the local file path (e.g., Phi-3-mini-4k-instruct-q4.gguf
) to the local_path
variable.
You can replace this path with any local file path you prefer.
Use the Python Documentation to load and run your selected model in your project.
GPT4All
is a powerful large-scale language model
, similar to GPT-3
, designed to support a variety of natural language processing tasks.
This module allows you to easily load the GPT4All model
and perform inference seamlessly.
In the following example, we demonstrate how to load the GPT4All model
and utilize it to answer a question
by leveraging a custom prompt
and inference pipeline
.
[NOTE]
Due to structural changes, in version 0.3.13
, you need to replace from langchain.prompts import ChatPromptTemplate
with from langchain_core.prompts import ChatPromptTemplate
.
The ChatPromptTemplate is responsible for creating prompt templates in LangChain and dynamically substituting variables. Without using the invoke() method, you can utilize the class's template methods to generate prompts. In this case, the template can be returned as a string using the format method.# Using format() instead of invoke()result = prompt.format(question="What is the capital of United States?")In a nutshell, the invoke() method is great for chain-based tasks, while the format() method is perfect for returning simple strings.result = prompt.format(question="where is the capital of United States?")print(result)Human: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. Human: where is the capital of United States? Assistant: You might notice that Human: is automatically added to the output. If you'd like to avoid this behavior, you can use LangChain's PromptTemplate class instead of ChatPromptTemplate. The PromptTemplate class doesn’t add any extra prefixes like this.from langchain_core.prompts.prompt import PromptTemplateprompt = PromptTemplate.from_template(template)formatted_prompt = prompt.format(question="Where is the capital of the United States?")print(formatted_prompt) A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. Human: Where is the capital of the United States? Assistant: We'll be using invoke for chain-based tasks, so go ahead and forget about the format method for now! 😉Using Chains to Display Results in Real-Time# Promptprompt = ChatPromptTemplate.from_template( """ <s>A chat between a user and an AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.</s> <s>Human: {question}</s> <s>Assistant:""")# GPT4All Language Model Initialization# Specify the path to the GPT4All model file in modelmodel = GPT4All( model=local_path, n_threads=8, # Number of threads to use. backend="gpu", # GPU Configuration streaming=True, # Streaming Configuration callbacks=[StreamingStdOutCallbackHandler()] # Callback Configuration)# Create the chainchain = prompt | model | StrOutputParser()# Execute the queryresponse = chain.invoke({"question": "where is the capital of United States?"})=== The capital of the United States is Washington, D.C., which stands for District of Columbia. It was established by the Constitution along with a federal district that would serve as the nation's seat of government and be under its exclusive jurisdiction. The city itself lies on the east bank of the Potomac River near its fall point where it empties into Chesapeake Bay, but Washington is not part of any U.S. state; instead, it has a special federal district status as outlined in Article I, Section 8 of the Constitution and further defined by the Residence Act of 1790 signed by President George Washington. Washington D.C.'s location was chosen to be near the nation's capital at that time—Philadelphia, Pennsylvania—and it also holds symbolic significance as a neutral ground for both northern and southern states during their early years in America. The city is home to many iconic landmarks such as the U.S. Capitol Building where Congress meets, the White House (the residence of the President), Supreme Court buildings, numerous museums like the Smithsonian Institution's National Museum of American History or Natural History and Air & Space, among othersSummaryToday, we explored GPT4ALL together! We didn’t just run models — we took part in the decision-making process, from selecting a model to suit our needs to choosing the right methods based on our desired outcomes or execution direction. Along the way, we compared the performance of popular models and even ran the code ourselves.Next time, we’ll dive into Video Q&A LLM (Gemini). Until then, try running today’s code with different models and see how they perform. See you soon! 😊