LangSmith Repeat Evaluation
Last updated
Last updated
Author:
Design:
Peer Review:
This is a part of
Repeat evaluation is a method for measuring the performance of a model more accurately by performing multiple evaluations on the same dataset.
You can add repetition to the experiment. This notebook demonstrates how to use LangSmith
for repeatable evaluations of language models. It covers setting up evaluation workflows, running evaluations on different datasets, and analyzing results to ensure consistency. The focus is on leveraging LangSmith
's tools for reproducible and scalable model evaluation.
This allows the evaluation to be repeated multiple times, which is useful in the following cases:
For larger evaluation sets
For chains that can generate variable responses
For evaluations that can produce variable scores (e.g., llm-as-judge
)
You can learn how to run an evaluation from .
[Note]
langchain-opentutorial
is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.
You can alternatively set OPENAI_API_KEY in .env file and load it.
[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.
num_repetitions
LangSmith
offers a simple way to perform repetitive evaluations using the num_repetitions
parameter in the evaluate function. This parameter specifies how many times each example in your dataset should be evaluated.
When you set num_repetitions=N
, LangSmith
will:
Run each example in your dataset N times.
Aggregate the results to provide a more accurate measure of your model's performance.
For example:
If your dataset has 10 examples and you set num_repetitions=5
, each example will be evaluated 5 times, resulting in a total of 50 runs.
Create a RAG system to use for performance testing.
Below is an example of loading and invoking the model:
This section demonstrates the process of conducting repetitive evaluations of a RAG system using GPT models. It focuses on setting up and executing repeated tests to assess the consistency and performance of the RAG system across various scenarios, helping to identify potential areas for improvement and ensure reliable outputs.
0
What are the three targeted learnings to enhan...
What are the three targeted learnings to enhan...
Agents\n33\nSeptember 2024\nEnhancing model pe...
The three targeted learnings to enhance model ...
None
The three targeted learning approaches to enha...
1
13.151277
0e661de4-636b-425d-8f6e-0a52b8070576
510240bb-4c28-4440-a769-929be7edb98f
1
What are the key functions of an agent's orche...
What are the key functions of an agent's orche...
implementation of the agent orchestration laye...
The key functions of an agent's orchestration ...
None
The key functions of an agent's orchestration ...
1
4.226702
3561c6fe-6ed4-4182-989a-270dcd635f32
60c42896-89fe-4a57-b8e3-e5cdacabae30
2
List up the name of the authors
List up the name of the authors
Agents\nAuthors: Julia Wiesinger, Patrick Marl...
The authors of the document are Julia Wiesinge...
None
The authors are Julia Wiesinger, Patrick Marlo...
1
2.524669
b03e98d1-44ad-4142-8dfa-7b0a31a57096
d9a3335b-06d6-46a0-bcb1-3a84d3d56c66
3
What is Tree-of-thoughts?
What is Tree-of-thoughts?
weaknesses depending on the specific applicati...
Tree-of-thoughts (ToT) is a prompt engineering...
None
Tree-of-thoughts (ToT) is a prompt engineering...
1
2.944406
be18ec98-ab18-4f30-9205-e75f1cb70844
0d8cc590-0518-4098-b006-b0613d5e7cb8
4
What is the framework used for reasoning and p...
What is the framework used for reasoning and p...
reasoning frameworks (CoT, ReAct, etc.) to \nf...
The framework used for reasoning and planning ...
None
The frameworks used for reasoning and planning...
1
2.452457
eb4b29a7-511c-4f78-a08f-2d5afeb84320
155ef405-4754-441f-a178-177922122d63
5
How do agents differ from standalone language ...
How do agents differ from standalone language ...
1.\t Agents extend the capabilities of languag...
Agents differ from standalone language models ...
None
Agents can use tools to access real-time data ...
1
2.868793
f4a5a0cf-2d2e-4e15-838a-bc8296eb708b
e0d61836-a440-463d-82c0-c32053b6337b
6
What are the three targeted learnings to enhan...
What are the three targeted learnings to enhan...
Agents\n33\nSeptember 2024\nEnhancing model pe...
The three targeted learnings to enhance model ...
None
The three targeted learning approaches to enha...
1
3.615821
0e661de4-636b-425d-8f6e-0a52b8070576
65fb7cdf-4545-4330-b4b4-055fdfe710cb
7
What are the key functions of an agent's orche...
What are the key functions of an agent's orche...
implementation of the agent orchestration laye...
The key functions of an agent's orchestration ...
None
The key functions of an agent's orchestration ...
1
2.201849
3561c6fe-6ed4-4182-989a-270dcd635f32
9d587a12-e035-45d6-9a8b-64c58ae4dd67
8
List up the name of the authors
List up the name of the authors
Agents\nAuthors: Julia Wiesinger, Patrick Marl...
The authors listed are Julia Wiesinger, Patric...
None
The authors are Julia Wiesinger, Patrick Marlo...
1
1.720297
b03e98d1-44ad-4142-8dfa-7b0a31a57096
eaff2aba-0e70-4a7c-b47f-912ac6318016
9
What is Tree-of-thoughts?
What is Tree-of-thoughts?
weaknesses depending on the specific applicati...
Tree-of-thoughts (ToT) is a prompt engineering...
None
Tree-of-thoughts (ToT) is a prompt engineering...
1
2.107871
be18ec98-ab18-4f30-9205-e75f1cb70844
7029baaf-2e66-4d71-98c5-443577b5c430
10
What is the framework used for reasoning and p...
What is the framework used for reasoning and p...
reasoning frameworks (CoT, ReAct, etc.) to \nf...
The frameworks used for reasoning and planning...
None
The frameworks used for reasoning and planning...
1
2.265368
eb4b29a7-511c-4f78-a08f-2d5afeb84320
04b223a3-5ae5-4180-a0c0-db818a9e28af
11
How do agents differ from standalone language ...
How do agents differ from standalone language ...
1.\t Agents extend the capabilities of languag...
Agents differ from standalone language models ...
None
Agents can use tools to access real-time data ...
1
2.088294
f4a5a0cf-2d2e-4e15-838a-bc8296eb708b
676c6265-8cc1-41ac-828c-e294ac3f4a10
12
What are the three targeted learnings to enhan...
What are the three targeted learnings to enhan...
Agents\n33\nSeptember 2024\nEnhancing model pe...
The three targeted learning approaches mention...
None
The three targeted learning approaches to enha...
1
3.550540
0e661de4-636b-425d-8f6e-0a52b8070576
1b92081d-ca19-4679-906e-187dea30a5dc
13
What are the key functions of an agent's orche...
What are the key functions of an agent's orche...
implementation of the agent orchestration laye...
The key functions of an agent's orchestration ...
None
The key functions of an agent's orchestration ...
1
4.070889
3561c6fe-6ed4-4182-989a-270dcd635f32
07b70cac-203f-4d39-998d-befef6bc0bd8
14
List up the name of the authors
List up the name of the authors
Agents\nAuthors: Julia Wiesinger, Patrick Marl...
The authors are Julia Wiesinger, Patrick Marlo...
None
The authors are Julia Wiesinger, Patrick Marlo...
1
1.588084
b03e98d1-44ad-4142-8dfa-7b0a31a57096
0f6ccf7a-f79f-4fdb-ab00-4831930e6e98
15
What is Tree-of-thoughts?
What is Tree-of-thoughts?
weaknesses depending on the specific applicati...
Tree-of-thoughts (ToT) is a prompt engineering...
None
Tree-of-thoughts (ToT) is a prompt engineering...
1
2.138192
be18ec98-ab18-4f30-9205-e75f1cb70844
bd0f5f68-215e-4756-b87b-0aef5e4f01ab
16
What is the framework used for reasoning and p...
What is the framework used for reasoning and p...
reasoning frameworks (CoT, ReAct, etc.) to \nf...
The frameworks used for reasoning and planning...
None
The frameworks used for reasoning and planning...
1
2.071085
eb4b29a7-511c-4f78-a08f-2d5afeb84320
826d6013-987c-4095-80dd-612591271c2f
17
How do agents differ from standalone language ...
How do agents differ from standalone language ...
1.\t Agents extend the capabilities of languag...
Agents differ from standalone language models ...
None
Agents can use tools to access real-time data ...
1
2.863684
f4a5a0cf-2d2e-4e15-838a-bc8296eb708b
5b172bbf-abe0-4a71-8a32-d2f05e4039bb
This part focuses on performing repetitive evaluations of the RAG system using Ollama. It illustrates the process of setting up and running multiple tests with Ollama, allowing for a comprehensive evaluation of the RAG system's performance with these specific models.
0
What are the three targeted learnings to enhan...
What are the three targeted learnings to enhan...
Agents\n33\nSeptember 2024\nEnhancing model pe...
In-context learning, Fine-tuning based learning.
None
The three targeted learning approaches to enha...
0.0
2.527441
0e661de4-636b-425d-8f6e-0a52b8070576
96233779-b37d-484f-85a8-22a7320ff72b
1
What are the key functions of an agent's orche...
What are the key functions of an agent's orche...
implementation of the agent orchestration laye...
Based on the retrieved context, it appears tha...
None
The key functions of an agent's orchestration ...
0.0
7.891397
3561c6fe-6ed4-4182-989a-270dcd635f32
5f761c37-3bf0-4b64-91bf-0b1167165184
2
List up the name of the authors
List up the name of the authors
Agents\nAuthors: Julia Wiesinger, Patrick Marl...
The names of the authors are:\n\n1. Julia Wies...
None
The authors are Julia Wiesinger, Patrick Marlo...
1.0
3.461620
b03e98d1-44ad-4142-8dfa-7b0a31a57096
5e56e10f-9220-4107-b0da-cfd206e4cd27
3
What is Tree-of-thoughts?
What is Tree-of-thoughts?
weaknesses depending on the specific applicati...
Tree-of-thoughts is a prompt engineering frame...
None
Tree-of-thoughts (ToT) is a prompt engineering...
1.0
3.017406
be18ec98-ab18-4f30-9205-e75f1cb70844
4f0f23af-2cf3-4de2-923f-d8cbbd184a47
4
What is the framework used for reasoning and p...
What is the framework used for reasoning and p...
reasoning frameworks (CoT, ReAct, etc.) to \nf...
Based on the provided context, it appears that...
None
The frameworks used for reasoning and planning...
0.0
8.636841
eb4b29a7-511c-4f78-a08f-2d5afeb84320
f729da06-0b0e-42ff-88f1-64676e19d1b0
5
How do agents differ from standalone language ...
How do agents differ from standalone language ...
1.\t Agents extend the capabilities of languag...
According to the context, agents differ from s...
None
Agents can use tools to access real-time data ...
1.0
6.293883
f4a5a0cf-2d2e-4e15-838a-bc8296eb708b
045cdaba-4dc0-46ad-a955-4d00944bfabd
6
What are the three targeted learnings to enhan...
What are the three targeted learnings to enhan...
Agents\n33\nSeptember 2024\nEnhancing model pe...
The two methods mentioned for enhancing model ...
None
The three targeted learning approaches to enha...
0.0
3.524431
0e661de4-636b-425d-8f6e-0a52b8070576
e1f26ba7-cd91-4e4f-8684-4af4262b8c17
7
What are the key functions of an agent's orche...
What are the key functions of an agent's orche...
implementation of the agent orchestration laye...
Based on the retrieved context, the key functi...
None
The key functions of an agent's orchestration ...
NaN
5.473330
3561c6fe-6ed4-4182-989a-270dcd635f32
10df33b1-8936-454f-9c13-9baedb8d557a
8
List up the name of the authors
List up the name of the authors
Agents\nAuthors: Julia Wiesinger, Patrick Marl...
The names of the authors are:\n\n1. Julia Wies...
None
The authors are Julia Wiesinger, Patrick Marlo...
1.0
2.525374
b03e98d1-44ad-4142-8dfa-7b0a31a57096
77e497f6-3f3e-400d-a385-72063096f879
9
What is Tree-of-thoughts?
What is Tree-of-thoughts?
weaknesses depending on the specific applicati...
Tree-of-thoughts (ToT) is a prompt engineering...
None
Tree-of-thoughts (ToT) is a prompt engineering...
1.0
2.907534
be18ec98-ab18-4f30-9205-e75f1cb70844
a6b767b3-b831-4cbb-a62f-2e351a948a01
10
What is the framework used for reasoning and p...
What is the framework used for reasoning and p...
reasoning frameworks (CoT, ReAct, etc.) to \nf...
Based on the retrieved context, it appears tha...
None
The frameworks used for reasoning and planning...
0.0
6.760531
eb4b29a7-511c-4f78-a08f-2d5afeb84320
c00fd2ce-4108-45e8-8b0d-0e2419e883f3
11
How do agents differ from standalone language ...
How do agents differ from standalone language ...
1.\t Agents extend the capabilities of languag...
Based on the provided context, it appears that...
None
Agents can use tools to access real-time data ...
1.0
6.969271
f4a5a0cf-2d2e-4e15-838a-bc8296eb708b
239706b7-f82c-49dd-a4ba-15d845d40f3e
12
What are the three targeted learnings to enhan...
What are the three targeted learnings to enhan...
Agents\n33\nSeptember 2024\nEnhancing model pe...
In-context learning and Fine-tuning based lear...
None
The three targeted learning approaches to enha...
0.0
2.515873
0e661de4-636b-425d-8f6e-0a52b8070576
bad8da17-774d-43e4-b0f1-9436f4a6f516
13
What are the key functions of an agent's orche...
What are the key functions of an agent's orche...
implementation of the agent orchestration laye...
The key functions of an agent's orchestration ...
None
The key functions of an agent's orchestration ...
0.0
6.819861
3561c6fe-6ed4-4182-989a-270dcd635f32
a08170c2-8953-450f-9e49-1b431f87f506
14
List up the name of the authors
List up the name of the authors
Agents\nAuthors: Julia Wiesinger, Patrick Marl...
The names of the authors are:\n\n1. Julia Wies...
None
The authors are Julia Wiesinger, Patrick Marlo...
1.0
2.512632
b03e98d1-44ad-4142-8dfa-7b0a31a57096
e7b1221e-23fe-4715-8315-daa7375dd73f
15
What is Tree-of-thoughts?
What is Tree-of-thoughts?
weaknesses depending on the specific applicati...
Tree-of-Thoughts (ToT) is a prompt engineering...
None
Tree-of-thoughts (ToT) is a prompt engineering...
1.0
3.005581
be18ec98-ab18-4f30-9205-e75f1cb70844
9c043533-e24e-498d-a27c-02b5499fd27e
16
What is the framework used for reasoning and p...
What is the framework used for reasoning and p...
reasoning frameworks (CoT, ReAct, etc.) to \nf...
Based on the provided context, it seems that t...
None
The frameworks used for reasoning and planning...
0.0
4.558945
eb4b29a7-511c-4f78-a08f-2d5afeb84320
8875837e-fca5-4bf8-bf94-2fc733ae7387
17
How do agents differ from standalone language ...
How do agents differ from standalone language ...
1.\t Agents extend the capabilities of languag...
According to the retrieved context, agents dif...
None
Agents can use tools to access real-time data ...
0.0
5.888388
f4a5a0cf-2d2e-4e15-838a-bc8296eb708b
1889177c-ea36-488d-9327-26147f4e83ee
Set up the environment. You may refer to for more details.
You can checkout the for more details.
In this tutorial, we use the llama3.2
model for repetitive evaluations. Make sure to install on your local machine and run ollama pull llama3.2
to download the model before proceeding with this tutorial.