This module provides tools to evaluate and track the performance of language models using LangSmith's online evaluation capabilities.
By setting up chains and using custom configurations, users can assess model outputs, including hallucination detection and context recall, ensuring robust performance in various scenarios.
[Note] If you are using a .env file, proceed as follows.
from dotenv import load_dotenvload_dotenv(override=True)
True
Build a Pipeline for Online Evaluations
The provided Python script defines a class PDFRAG and related functionality to set up a retriever-augmented generation (RAG) pipeline for online evaluation of language models.
Explain for 'PDFRAG'
The PDFRAG class is a modular framework for:
Document Loading: Ingesting a PDF document.
Document Splitting: Dividing the content into manageable chunks for processing.
Vectorstore Creation: Converting chunks into vector representations using embeddings.
Retriever Setup: Enabling retrieval of the most relevant chunks for a given query.
Chain Construction: Creating a question-answering (QA) chain with prompt templates.
from langchain_text_splitters import RecursiveCharacterTextSplitterfrom langchain_community.document_loaders import PyMuPDFLoaderfrom langchain_community.vectorstores import FAISSfrom langchain_core.output_parsers import StrOutputParserfrom langchain_core.prompts import PromptTemplatefrom langchain_openai import OpenAIEmbeddingsfrom langchain_core.runnables import RunnablePassthroughclassPDFRAG:def__init__(self,file_path:str,llm): self.file_path = file_path self.llm = llmdefload_documents(self):# Load Documents loader =PyMuPDFLoader(self.file_path) docs = loader.load()return docsdefsplit_documents(self,docs):# Split Documents text_splitter =RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50) split_documents = text_splitter.split_documents(docs)return split_documentsdefcreate_vectorstore(self,split_documents):# Embedding embeddings =OpenAIEmbeddings(model="text-embedding-3-small")# Create DB vectorstore = FAISS.from_documents( documents=split_documents, embedding=embeddings )return vectorstoredefcreate_retriever(self): vectorstore = self.create_vectorstore( self.split_documents(self.load_documents()) )# Retriever retriever = vectorstore.as_retriever()return retrieverdefcreate_chain(self,retriever):# Create Prompt prompt = PromptTemplate.from_template("""You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. #Context: {context} #Question:{question} #Answer:""" )# Chain chain = ({"context": retriever,"question":RunnablePassthrough(),}| prompt| self.llm|StrOutputParser() )return chain
Set Up the RAG System with PDFRAG
The following code demonstrates how to instantiate and use the PDFRAG class to set up a retriever-augmented generation (RAG) pipeline using a specific PDF document and a GPT-based model.
from langchain_openai import ChatOpenAI# Create a PDFRAG objectrag =PDFRAG("data/Newwhitepaper_Agents2.pdf",ChatOpenAI(model="gpt-4o-mini", temperature=0),)# Create a retrieverretriever = rag.create_retriever()# Create a chainrag_chain = rag.create_chain(retriever)
Create a Parallel Evaluation Runnable
The following code demonstrates how to create a RunnableParallel object to evaluate multiple aspects of the retriever-augmented generation (RAG) pipeline concurrently.
from langchain_core.runnables import RunnableParallel, RunnablePassthrough# Create a RunnableParallel object.evaluation_runnable =RunnableParallel( {"context": retriever,"answer": rag_chain,"question": RunnablePassthrough(), })
_ = evaluation_runnable.invoke("How do agents differ from standalone language models?")
Make Online LLM-as-judge
1. click Add Rule
2. Create Evaluator
3. Set Secrets & API Keys
4. Set Provider, Model, Prompt
5. Select Halluciantion
6. Set facts for output.context
7. Set answer for output.answer
8. Check Preview for Data
Caution
You must view the preview and then turn off preview mode again before proceeding to the next step. And you have to fill "Name" to continue.
9. Save and Continue
10. Make "Tag"
Instead of evaluating all steps, you can set "Tag" to evaluate only specific tags.
11. Set "Tag" that you want
12. Run evaluations only for specific tags (hallucination)
Run Evaluations
The following code demonstrates how to perform evaluations on the retriever-augmented generation (RAG) pipeline, including hallucination detection, context recall assessment, and combined evaluations.
from langchain_core.runnables import RunnableConfig# set a taghallucination_config =RunnableConfig(tags=["hallucination_eval"])context_recall_config =RunnableConfig(tags=["context_recall_eval"])all_eval_config =RunnableConfig(tags=["hallucination_eval", "context_recall_eval"])
# run chain_ = evaluation_runnable.invoke("How do agents differ from standalone language models?")
# Request a Hallucination evaluation_ = evaluation_runnable.invoke("How do agents differ from standalone language models?", config=hallucination_config)
# Request a Context Recall assessment_ = evaluation_runnable.invoke("How do agents differ from standalone language models?", config=context_recall_config,)
# All evaluation requests_ = evaluation_runnable.invoke("How do agents differ from standalone language models?", config=all_eval_config)