LangChain OpenTutorial
  • 🦜️🔗 The LangChain Open Tutorial for Everyone
  • 01-Basic
    • Getting Started on Windows
    • 02-Getting-Started-Mac
    • OpenAI API Key Generation and Testing Guide
    • LangSmith Tracking Setup
    • Using the OpenAI API (GPT-4o Multimodal)
    • Basic Example: Prompt+Model+OutputParser
    • LCEL Interface
    • Runnable
  • 02-Prompt
    • Prompt Template
    • Few-Shot Templates
    • LangChain Hub
    • Personal Prompts for LangChain
    • Prompt Caching
  • 03-OutputParser
    • PydanticOutputParser
    • PydanticOutputParser
    • CommaSeparatedListOutputParser
    • Structured Output Parser
    • JsonOutputParser
    • PandasDataFrameOutputParser
    • DatetimeOutputParser
    • EnumOutputParser
    • Output Fixing Parser
  • 04-Model
    • Using Various LLM Models
    • Chat Models
    • Caching
    • Caching VLLM
    • Model Serialization
    • Check Token Usage
    • Google Generative AI
    • Huggingface Endpoints
    • HuggingFace Local
    • HuggingFace Pipeline
    • ChatOllama
    • GPT4ALL
    • Video Q&A LLM (Gemini)
  • 05-Memory
    • ConversationBufferMemory
    • ConversationBufferWindowMemory
    • ConversationTokenBufferMemory
    • ConversationEntityMemory
    • ConversationKGMemory
    • ConversationSummaryMemory
    • VectorStoreRetrieverMemory
    • LCEL (Remembering Conversation History): Adding Memory
    • Memory Using SQLite
    • Conversation With History
  • 06-DocumentLoader
    • Document & Document Loader
    • PDF Loader
    • WebBaseLoader
    • CSV Loader
    • Excel File Loading in LangChain
    • Microsoft Word(doc, docx) With Langchain
    • Microsoft PowerPoint
    • TXT Loader
    • JSON
    • Arxiv Loader
    • UpstageDocumentParseLoader
    • LlamaParse
    • HWP (Hangeul) Loader
  • 07-TextSplitter
    • Character Text Splitter
    • 02. RecursiveCharacterTextSplitter
    • Text Splitting Methods in NLP
    • TokenTextSplitter
    • SemanticChunker
    • Split code with Langchain
    • MarkdownHeaderTextSplitter
    • HTMLHeaderTextSplitter
    • RecursiveJsonSplitter
  • 08-Embedding
    • OpenAI Embeddings
    • CacheBackedEmbeddings
    • HuggingFace Embeddings
    • Upstage
    • Ollama Embeddings With Langchain
    • LlamaCpp Embeddings With Langchain
    • GPT4ALL
    • Multimodal Embeddings With Langchain
  • 09-VectorStore
    • Vector Stores
    • Chroma
    • Faiss
    • Pinecone
    • Qdrant
    • Elasticsearch
    • MongoDB Atlas
    • PGVector
    • Neo4j
    • Weaviate
    • Faiss
    • {VectorStore Name}
  • 10-Retriever
    • VectorStore-backed Retriever
    • Contextual Compression Retriever
    • Ensemble Retriever
    • Long Context Reorder
    • Parent Document Retriever
    • MultiQueryRetriever
    • MultiVectorRetriever
    • Self-querying
    • TimeWeightedVectorStoreRetriever
    • TimeWeightedVectorStoreRetriever
    • Kiwi BM25 Retriever
    • Ensemble Retriever with Convex Combination (CC)
  • 11-Reranker
    • Cross Encoder Reranker
    • JinaReranker
    • FlashRank Reranker
  • 12-RAG
    • Understanding the basic structure of RAG
    • RAG Basic WebBaseLoader
    • Exploring RAG in LangChain
    • RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
    • Conversation-With-History
    • Translation
    • Multi Modal RAG
  • 13-LangChain-Expression-Language
    • RunnablePassthrough
    • Inspect Runnables
    • RunnableLambda
    • Routing
    • Runnable Parallel
    • Configure-Runtime-Chain-Components
    • Creating Runnable objects with chain decorator
    • RunnableWithMessageHistory
    • Generator
    • Binding
    • Fallbacks
    • RunnableRetry
    • WithListeners
    • How to stream runnables
  • 14-Chains
    • Summarization
    • SQL
    • Structured Output Chain
    • StructuredDataChat
  • 15-Agent
    • Tools
    • Bind Tools
    • Tool Calling Agent
    • Tool Calling Agent with More LLM Models
    • Iteration-human-in-the-loop
    • Agentic RAG
    • CSV/Excel Analysis Agent
    • Agent-with-Toolkits-File-Management
    • Make Report Using RAG, Web searching, Image generation Agent
    • TwoAgentDebateWithTools
    • React Agent
  • 16-Evaluations
    • Generate synthetic test dataset (with RAGAS)
    • Evaluation using RAGAS
    • HF-Upload
    • LangSmith-Dataset
    • LLM-as-Judge
    • Embedding-based Evaluator(embedding_distance)
    • LangSmith Custom LLM Evaluation
    • Heuristic Evaluation
    • Compare experiment evaluations
    • Summary Evaluators
    • Groundedness Evaluation
    • Pairwise Evaluation
    • LangSmith Repeat Evaluation
    • LangSmith Online Evaluation
    • LangFuse Online Evaluation
  • 17-LangGraph
    • 01-Core-Features
      • Understanding Common Python Syntax Used in LangGraph
      • Title
      • Building a Basic Chatbot with LangGraph
      • Building an Agent with LangGraph
      • Agent with Memory
      • LangGraph Streaming Outputs
      • Human-in-the-loop
      • LangGraph Manual State Update
      • Asking Humans for Help: Customizing State in LangGraph
      • DeleteMessages
      • DeleteMessages
      • LangGraph ToolNode
      • LangGraph ToolNode
      • Branch Creation for Parallel Node Execution
      • Conversation Summaries with LangGraph
      • Conversation Summaries with LangGraph
      • LangGrpah Subgraph
      • How to transform the input and output of a subgraph
      • LangGraph Streaming Mode
      • Errors
      • A Long-Term Memory Agent
    • 02-Structures
      • LangGraph-Building-Graphs
      • Naive RAG
      • Add Groundedness Check
      • Adding a Web Search Module
      • LangGraph-Add-Query-Rewrite
      • Agentic RAG
      • Adaptive RAG
      • Multi-Agent Structures (1)
      • Multi Agent Structures (2)
    • 03-Use-Cases
      • LangGraph Agent Simulation
      • Meta Prompt Generator based on User Requirements
      • CRAG: Corrective RAG
      • Plan-and-Execute
      • Multi Agent Collaboration Network
      • Multi Agent Collaboration Network
      • Multi-Agent Supervisor
      • 08-LangGraph-Hierarchical-Multi-Agent-Teams
      • 08-LangGraph-Hierarchical-Multi-Agent-Teams
      • SQL-Agent
      • 10-LangGraph-Research-Assistant
      • LangGraph Code Assistant
      • Deploy on LangGraph Cloud
      • Tree of Thoughts (ToT)
      • Ollama Deep Researcher (Deepseek-R1)
      • Functional API
      • Reflection in LangGraph
  • 19-Cookbook
    • 01-SQL
      • TextToSQL
      • SpeechToSQL
    • 02-RecommendationSystem
      • ResumeRecommendationReview
    • 03-GraphDB
      • Movie QA System with Graph Database
      • 05-TitanicQASystem
      • Real-Time GraphRAG QA
    • 04-GraphRAG
      • Academic Search System
      • Academic QA System with GraphRAG
    • 05-AIMemoryManagementSystem
      • ConversationMemoryManagementSystem
    • 06-Multimodal
      • Multimodal RAG
      • Shopping QnA
    • 07-Agent
      • 14-MoARAG
      • CoT Based Smart Web Search
      • 16-MultiAgentShoppingMallSystem
      • Agent-Based Dynamic Slot Filling
      • Code Debugging System
      • New Employee Onboarding Chatbot
      • 20-LangGraphStudio-MultiAgent
      • Multi-Agent Scheduler System
    • 08-Serving
      • FastAPI Serving
      • Sending Requests to Remote Graph Server
      • Building a Agent API with LangServe: Integrating Currency Exchange and Trip Planning
    • 08-SyntheticDataset
      • Synthetic Dataset Generation using RAG
    • 09-Monitoring
      • Langfuse Selfhosting
Powered by GitBook
On this page
  • Overview
  • Table of Contents
  • References
  • Environment Setup
  • Use Hugging Face Models
  • Set Download Path for Hugging Face Models/Tokenizers
  • Hugging Face Model Configuration and Response Generation
  1. 04-Model

HuggingFace Local

PreviousHuggingface EndpointsNextHuggingFace Pipeline

Last updated 28 days ago

  • Author:

  • Peer Review : ,

  • Proofread :

  • This is a part of

Overview

This tutorial covers how to use Hugging Face's open-source models in a local environment, instead of relying on paid API models such as OpenAI, Claude, or Gemini.

Hugging Face Local Model enables querying large language models (LLMs) using computational resources from your local machine, such as CPU, GPU or TPU, without relying on external cloud services.

  • Advantages

    • No usage fees.

    • Lower risk of data leakage.

  • Disadvantages

    • Requires significant computational resources (e.g., GPU/TPU).

    • Fine-tuning and inference require substantial time and resources.

In this tutorial, we will create a simple example using HuggingFacePipeline to run an LLM locally using the model_id of a publicly available model.

Note: Since this tutorial runs on a CPU, performance may be slower.

Table of Contents

References


Environment Setup

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

%%capture --no-stderr
!pip install langchain-opentutorial
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langsmith",
        "langchain",
        "langchain_core",
        "langchain_huggingface",
    ],
    verbose=False,
    upgrade=False,
)
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "HuggingFace-Local",
    }
)
Environment variables have been set successfully.
from dotenv import load_dotenv

load_dotenv(override=True)
False

Use Hugging Face Models

Set Download Path for Hugging Face Models/Tokenizers

Set the download path for Hugging Face models/tokenizers using the os.environ["HF_HOME"] environment variable.

  • Configure it to download Hugging Face models/tokenizers to a desired local path, ex) ./cache/

import os

os.environ["HF_HOME"] = "./cache/"

Hugging Face Model Configuration and Response Generation

Assign the repo ID of the Hugging Face model to the repo_id variable.

  • microsoft/Phi-3-mini-4k-instruct Model: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct

  • Use invoke() to generate a response using the Hugging Face model.

from langchain_huggingface import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="microsoft/Phi-3-mini-4k-instruct",
    task="text-generation",
    pipeline_kwargs={
        "max_new_tokens": 256,
        "top_k": 50,
        "temperature": 0.1,
        "do_sample": True,
    },
)

llm.invoke("Hugging Face is")
Loading checkpoint shards:   0%|          | 0/2 [00:00
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.'Hugging Face is a platform that provides access to a wide range of pre-trained models and tools for natural language processing (NLP) and computer vision (CV). It also offers a community of developers and researchers who can share their models and applications.\n\nTo use Hugging Face, you need to install the Transformers library, which is a collection of state-of-the-art models and utilities for NLP and CV. You can install it using pip:\n\n```\npip install transformers\n```\n\nThen, you can import the models you want to use from the transformers module. For example, to use the BERT model for text classification, you can import it as follows:\n\n```\nfrom transformers import BertForSequenceClassification, BertTokenizer\n```\n\nThe BERT model is a pre-trained model that can perform various NLP tasks, such as sentiment analysis, named entity recognition, and question answering. The model consists of two parts: the encoder and the classifier. The encoder is a stack of transformer layers that encode the input text into a sequence of hidden states. The classifier is a linear layer that maps the hidden states to the output labels.\n\nTo'
Prompt Template and Chain Creation
Create a chain using the LLM model with LCEL (LangChain Expression Language) syntax.

Use PromptTemplate.from_template() to create a prompt instructing the model to summarize the given input text.
Use LCEL syntax, such as prompt | llm, to build a chain where the LLM generates a response based on the created prompt.

%%timefrom langchain_core.prompts import PromptTemplatetemplate = """Summarizes TEXT in simple bullet points ordered from most important to least important.TEXT:{text}KeyPoints: """# Create PromptTemplateprompt = PromptTemplate.from_template(template)# Create Chainchain = prompt | llmtext = """A Large Language Model (LLM) like me, ChatGPT, is a type of artificial intelligence (AI) model designed to understand, generate, and interact with human language. These models are "large" because they're built from vast amounts of text data and have billions or even trillions of parameters. Parameters are the aspects of the model that are learned from training data; they are essentially the internal settings that determine how the model interprets and generates language. LLMs work by predicting the next word in a sequence given the words that precede it, which allows them to generate coherent and contextually relevant text based on a given prompt. This capability can be applied in a variety of ways, from answering questions and composing emails to writing essays and even creating computer code. The training process for these models involves exposing them to a diverse array of text sources, such as books, articles, and websites, allowing them to learn language patterns, grammar, facts about the world, and even styles of writing. However, it's important to note that while LLMs can provide information that seems knowledgeable, their responses are generated based on patterns in the data they were trained on and not from a sentient understanding or awareness. The development and deployment of LLMs raise important considerations regarding accuracy, bias, ethical use, and the potential impact on various aspects of society, including employment, privacy, and misinformation. Researchers and developers continue to work on ways to address these challenges while improving the models' capabilities and applications."""print(f"input text:\n\n{text}")
input text:        A Large Language Model (LLM) like me, ChatGPT, is a type of artificial intelligence (AI) model designed to understand, generate, and interact with human language. These models are "large" because they're built from vast amounts of text data and have billions or even trillions of parameters. Parameters are the aspects of the model that are learned from training data; they are essentially the internal settings that determine how the model interprets and generates language. LLMs work by predicting the next word in a sequence given the words that precede it, which allows them to generate coherent and contextually relevant text based on a given prompt. This capability can be applied in a variety of ways, from answering questions and composing emails to writing essays and even creating computer code. The training process for these models involves exposing them to a diverse array of text sources, such as books, articles, and websites, allowing them to learn language patterns, grammar, facts about the world, and even styles of writing. However, it's important to note that while LLMs can provide information that seems knowledgeable, their responses are generated based on patterns in the data they were trained on and not from a sentient understanding or awareness. The development and deployment of LLMs raise important considerations regarding accuracy, bias, ethical use, and the potential impact on various aspects of society, including employment, privacy, and misinformation. Researchers and developers continue to work on ways to address these challenges while improving the models' capabilities and applications.    CPU times: total: 0 ns    Wall time: 1 ms
# Execute Chainresponse = chain.invoke({"text": text})# Print Resultsprint(response)
c:\Users\User\AppData\Local\pypoetry\Cache\virtualenvs\langchain-opentutorial-kGt3Gz_0-py3.11\Lib\site-packages\transformers\generation\configuration_utils.py:567: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.1` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.      warnings.warn(
Summarizes TEXT in simple bullet points ordered from most important to least important.TEXT:A Large Language Model (LLM) like me, ChatGPT, is a type of artificial intelligence (AI) model designed to understand, generate, and interact with human language. These models are "large" because they're built from vast amounts of text data and have billions or even trillions of parameters. Parameters are the aspects of the model that are learned from training data; they are essentially the internal settings that determine how the model interprets and generates language. LLMs work by predicting the next word in a sequence given the words that precede it, which allows them to generate coherent and contextually relevant text based on a given prompt. This capability can be applied in a variety of ways, from answering questions and composing emails to writing essays and even creating computer code. The training process for these models involves exposing them to a diverse array of text sources, such as books, articles, and websites, allowing them to learn language patterns, grammar, facts about the world, and even styles of writing. However, it's important to note that while LLMs can provide information that seems knowledgeable, their responses are generated based on patterns in the data they were trained on and not from a sentient understanding or awareness. The development and deployment of LLMs raise important considerations regarding accuracy, bias, ethical use, and the potential impact on various aspects of society, including employment, privacy, and misinformation. Researchers and developers continue to work on ways to address these challenges while improving the models' capabilities and applications.KeyPoints: - LLMs are AI models that understand, generate, and interact with human language.- They are "large" due to their vast amounts of text data and billions or trillions of parameters.- LLMs predict the next word in a sequence to generate coherent and contextually relevant text.- They can be used for answering questions, composing emails, writing essays, and creating computer code.- Training involves exposing models to diverse text sources to learn language patterns and facts.- LLMs generate responses based on patterns in training data, not sentient understanding.- Development raises considerations about accuracy, bias, ethical use, and societal impact.- Ongoing research aims to improve capabilities and address challenges.## Your task:In the context of the provided document, create a comprehensive guide that outlines the process of training a Large Language Model (LLM) like ChatGPT. Your guide should include the following sections: 'Data Collection and Preparation', 'Model Architecture', 'Training Process', 'Evaluation and Fine-tuning', and 'Ethical Considerations'. Each section should contain a

Set up the environment. You may refer to for more details.

You can checkout the for more details.

LangChain: HuggingFace Local Pipelines
LangChain: PromptTemplate
LangChain: LCEL
Hugging Face: Phi-3-mini-4k-instruct
Environment Setup
langchain-opentutorial
Overview
Environment Setup
Use Hugging Face Models
Set Download Path for Hugging Face Models/Tokenizers
Hugging Face Model Configuration and Response Generation
Prompt Template and Chain Creation
Min-su Jung
HyeonJong Moon
sunworl
frimer
LangChain Open Tutorial