This module provides tools to evaluate and track the performance of language models using LangSmith's online evaluation capabilities.
By setting up chains and using custom configurations, users can assess model outputs, including hallucination detection and context recall, ensuring robust performance in various scenarios.
[Note] If you are using a .env file, proceed as follows.
from dotenv import load_dotenv
load_dotenv(override=True)
True
Build a Pipeline for Online Evaluations
The provided Python script defines a class PDFRAG and related functionality to set up a retriever-augmented generation (RAG) pipeline for online evaluation of language models.
Explain for 'PDFRAG'
The PDFRAG class is a modular framework for:
Document Loading: Ingesting a PDF document.
Document Splitting: Dividing the content into manageable chunks for processing.
Vectorstore Creation: Converting chunks into vector representations using embeddings.
Retriever Setup: Enabling retrieval of the most relevant chunks for a given query.
Chain Construction: Creating a question-answering (QA) chain with prompt templates.
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAIEmbeddings
from langchain_core.runnables import RunnablePassthrough
class PDFRAG:
def __init__(self, file_path: str, llm):
self.file_path = file_path
self.llm = llm
def load_documents(self):
# Load Documents
loader = PyMuPDFLoader(self.file_path)
docs = loader.load()
return docs
def split_documents(self, docs):
# Split Documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
split_documents = text_splitter.split_documents(docs)
return split_documents
def create_vectorstore(self, split_documents):
# Embedding
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Create DB
vectorstore = FAISS.from_documents(
documents=split_documents, embedding=embeddings
)
return vectorstore
def create_retriever(self):
vectorstore = self.create_vectorstore(
self.split_documents(self.load_documents())
)
# Retriever
retriever = vectorstore.as_retriever()
return retriever
def create_chain(self, retriever):
# Create Prompt
prompt = PromptTemplate.from_template(
"""You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
#Context:
{context}
#Question:
{question}
#Answer:"""
)
# Chain
chain = (
{
"context": retriever,
"question": RunnablePassthrough(),
}
| prompt
| self.llm
| StrOutputParser()
)
return chain
Set Up the RAG System with PDFRAG
The following code demonstrates how to instantiate and use the PDFRAG class to set up a retriever-augmented generation (RAG) pipeline using a specific PDF document and a GPT-based model.
from langchain_openai import ChatOpenAI
# Create a PDFRAG object
rag = PDFRAG(
"data/Newwhitepaper_Agents2.pdf",
ChatOpenAI(model="gpt-4o-mini", temperature=0),
)
# Create a retriever
retriever = rag.create_retriever()
# Create a chain
rag_chain = rag.create_chain(retriever)
Create a Parallel Evaluation Runnable
The following code demonstrates how to create a RunnableParallel object to evaluate multiple aspects of the retriever-augmented generation (RAG) pipeline concurrently.
_ = evaluation_runnable.invoke("How do agents differ from standalone language models?")
Make Online LLM-as-judge
1. click Add Rule
2. Create Evaluator
3. Set Secrets & API Keys
4. Set Provider, Model, Prompt
5. Select Halluciantion
6. Set facts for output.context
7. Set answer for output.answer
8. Check Preview for Data
Caution
You must view the preview and then turn off preview mode again before proceeding to the next step. And you have to fill "Name" to continue.
9. Save and Continue
10. Make "Tag"
Instead of evaluating all steps, you can set "Tag" to evaluate only specific tags.
11. Set "Tag" that you want
12. Run evaluations only for specific tags (hallucination)
Run Evaluations
The following code demonstrates how to perform evaluations on the retriever-augmented generation (RAG) pipeline, including hallucination detection, context recall assessment, and combined evaluations.
from langchain_core.runnables import RunnableConfig
# set a tag
hallucination_config = RunnableConfig(tags=["hallucination_eval"])
context_recall_config = RunnableConfig(tags=["context_recall_eval"])
all_eval_config = RunnableConfig(tags=["hallucination_eval", "context_recall_eval"])
# run chain
_ = evaluation_runnable.invoke("How do agents differ from standalone language models?")
# Request a Hallucination evaluation
_ = evaluation_runnable.invoke(
"How do agents differ from standalone language models?", config=hallucination_config)
# Request a Context Recall assessment
_ = evaluation_runnable.invoke(
"How do agents differ from standalone language models?",
config=context_recall_config,
)
# All evaluation requests
_ = evaluation_runnable.invoke(
"How do agents differ from standalone language models?", config=all_eval_config
)