HuggingFace Local

Open in ColabOpen in GitHub

Overview

This tutorial covers how to use Hugging Face's open-source models in a local environment, instead of relying on paid API models such as OpenAI, Claude, or Gemini.

Hugging Face Local Model enables querying large language models (LLMs) using computational resources from your local machine, such as CPU, GPU or TPU, without relying on external cloud services.

  • Advantages

    • No usage fees.

    • Lower risk of data leakage.

  • Disadvantages

    • Requires significant computational resources (e.g., GPU/TPU).

    • Fine-tuning and inference require substantial time and resources.

In this tutorial, we will create a simple example using HuggingFacePipeline to run an LLM locally using the model_id of a publicly available model.

Note: Since this tutorial runs on a CPU, performance may be slower.

Table of Contents

References


Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorial for more details.

Use Hugging Face Models

Set Download Path for Hugging Face Models/Tokenizers

Set the download path for Hugging Face models/tokenizers using the os.environ["HF_HOME"] environment variable.

  • Configure it to download Hugging Face models/tokenizers to a desired local path, ex) ./cache/

Hugging Face Model Configuration and Response Generation

Assign the repo ID of the Hugging Face model to the repo_id variable.

  • microsoft/Phi-3-mini-4k-instruct Model: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct

  • Use invoke() to generate a response using the Hugging Face model.

Last updated