HuggingFace Pipeline

Open in Colabarrow-up-rightOpen in GitHubarrow-up-right

Overview

This tutorial covers how to run Hugging Face models locally through the HuggingFacePipeline class.

It explains how to load a model by specifying model parameters using the from_model_id method or by directly passing the transformers pipeline.

Using the generated hf object, it implements text generation for a given prompt.

By specifying parameters for the device, it also implements execution on a GPU device and batching.

  • Advantages

    • No usage fees.

    • Lower risk of data leakage.

  • Disadvantages

    • Requires significant computational resources.

Table of Contents

References


Environment Setup

Set up the environment. You may refer to Environment Setuparrow-up-right for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorialarrow-up-right for more details.

Hugging Face Local Pipelines

The Hugging Face models can be run locally through the HuggingFacePipeline class.

The Hugging Face model Hubarrow-up-right hosts over 120k models, 20k datasets, and 50k demo apps (Spaces) on its online platform, all of which are open-source and publicly available, allowing people to easily collaborate and build ML together.

These can be used in LangChain either by calling them through this local pipeline wrapper or by calling hosted inference endpoints through the HuggingFaseHub class. For more information on hosted pipelines, please refer to the HuggingFaseHubarrow-up-right notebook.

To use this, you should have the transformers python packagearrow-up-right installed, as well as PyTorcharrow-up-right.

Additionally, you may install xformers for a more memory-efficient attention implementation.

Set the path to download the model.

Model Loading

Models can be loaded by specifying model parameters using the method from_model_id.

  • The langchain-opentutorial class is used to load a pre-trained model from Hugging Face.

  • The from_model_id method is used to specify the microsoft/Phi-3-mini-4k-instruct model and set the task to "text-generation".

  • The pipeline_kwargs parameter is used to limit the maximum number of tokens to be generated to 64.

  • The loaded model is assigned to the hf variable, which can be used to perform text generation tasks.

The model used: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct

Last updated