Groundedness Evaluation

Open in Colabarrow-up-right Open in GitHubarrow-up-right

Overview

Groundedness Evaluator an Evaluator that assesses whether answers are accurate based on the given context. This Evaluator can be used to evaluate hallucinations in RAG's responses. In this tutorial, we will look at how to evaluate Groundedness using Upstage Groundedness Checker (UpstageGroundednessCheck) and a custom-made Groundedness Checker.

Table of Contents

References


Environment Setup

Setting up your environment is the first step. See the Environment Setuparrow-up-right guide for more details.

[Note]

The langchain-opentutorial is a package of easy-to-use environment setup guidance, useful functions and utilities for tutorials. Check out the langchain-opentutorialarrow-up-right for more details.

You can alternatively set API keys such as OPENAI_API_KEY in a .env file and load them.

[Note] This is not necessary if you've already set the required API keys in previous steps.

Define a function for RAG performance testing

Let's create an RAG system that will be used for testing.

Set Groundedness Checkers

To evaluate groundedness, UpstageGroundednessCheck and Custom Groundedness Checker will be used.

Set UpstageGroundednessCheck

To use Upstage's Groundedness Checker (UpstageGroundednessCheck), you need to obtain an API key from the link below.

Define UpstageGroundednessCheck Evaluator. It will be used in evaluate function.

Set Custom Groundedness Checker

Create a custom Groundedness Checker using OpenAI's model. For this tutorial, we'll use the target retrieval-answer. If you want to use other targets ('question-answer' or 'question-retrieval'), you should change the description in GroundednessScore and the prompt template in GroundednessChecker.create() accordingly.

Evaluate Groundedness using Upstage's and Custom Groundedness Checker

Evaluate Groundedness using Upstage's and Custom Groundedness Checker. Before this, check if the dataset exists created before. If you don't have the dataset, create one referring to 04-LangSmith-Datasetarrow-up-right. In this tutorial, we'll use custom Q&A dataset referred to Google Whitepaper on AI Agents arrow-up-right.

langsmith-groundedness-evaluation-01

Comprehensive evaluation of dataset using summary evaluators

This is useful when you want to evaluate the entire dataset. (The previous step was for individual data evaluation.)

langsmith-groundedness-evaluation-02

Last updated