HuggingFace Pipeline

Open in ColabOpen in GitHub

Overview

This tutorial covers how to run Hugging Face models locally through the HuggingFacePipeline class.

It explains how to load a model by specifying model parameters using the from_model_id method or by directly passing the transformers pipeline.

Using the generated hf object, it implements text generation for a given prompt.

By specifying parameters for the device, it also implements execution on a GPU device and batching.

  • Advantages

    • No usage fees.

    • Lower risk of data leakage.

  • Disadvantages

    • Requires significant computational resources.

Table of Contents

References


Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorial for more details.

Hugging Face Local Pipelines

The Hugging Face models can be run locally through the HuggingFacePipeline class.

The Hugging Face model Hub hosts over 120k models, 20k datasets, and 50k demo apps (Spaces) on its online platform, all of which are open-source and publicly available, allowing people to easily collaborate and build ML together.

These can be used in LangChain either by calling them through this local pipeline wrapper or by calling hosted inference endpoints through the HuggingFaseHub class. For more information on hosted pipelines, please refer to the HuggingFaseHub notebook.

To use this, you should have the transformers python package installed, as well as PyTorch.

Additionally, you may install xformers for a more memory-efficient attention implementation.

Set the path to download the model.

Model Loading

Models can be loaded by specifying model parameters using the method from_model_id.

  • The langchain-opentutorial class is used to load a pre-trained model from Hugging Face.

  • The from_model_id method is used to specify the microsoft/Phi-3-mini-4k-instruct model and set the task to "text-generation".

  • The pipeline_kwargs parameter is used to limit the maximum number of tokens to be generated to 64.

  • The loaded model is assigned to the hf variable, which can be used to perform text generation tasks.

The model used: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct

Last updated