LangSmith Repeat Evaluation
Last updated
Last updated
Author: Hwayoung Cha
Peer Review:
This is a part of LangChain Open Tutorial
Repetitive evaluation is a method of more accurately measuring a model's performance by conducting multiple evaluations on the same dataset.
You can add repetition to the experiment. This notebook demonstrates how to use LangSmith
for repeatable evaluations of language models. It covers setting up evaluation workflows, running evaluations on different datasets, and analyzing results to ensure consistency. The focus is on leveraging LangSmith
's tools for reproducible and scalable model assessments.
This allows the evaluation to be repeated multiple times, which is useful in the following cases:
For larger evaluation sets
For chains that can generate variable responses
For evaluations that can produce variable scores (e.g., llm-as-judge
)
You can learn how to run an evaluation from this site.
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorial
is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.
You can checkout the langchain-opentutorial
for more details.
You can alternatively set OPENAI_API_KEY in .env file and load it.
[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.
num_repetitions
LangSmith
provides a simple way to perform repetitive evaluations using the num_repetitions
parameter in the evaluate function. This parameter specifies how many times each example in your dataset should be evaluated.
When you set num_repetitions=N
, LangSmith
will:
Run each example in your dataset N times.
Aggregate the results to provide a more accurate measure of your model's performance.
For example:
If your dataset has 10 examples and you set num_repetitions=5
, each example will be evaluated 5 times, resulting in a total of 50 runs.
Create a RAG system to use for performance testing.
In this tutorial, we use the llama3.2
model for repetitive evaluations. Make sure to install Ollama
on your local machine and run ollama pull llama3.2
to download the model before proceeding with this tutorial.
Below is an example of loading and invoking the model:
This section demonstrates the process of conducting multiple evaluations of a RAG system using GPT models. It focuses on setting up and executing repeated tests to assess the consistency and performance of the RAG system across various scenarios, helping to identify potential areas for improvement and ensure reliable outputs.
0
What are the three targeted learnings to enhan...
What are the three targeted learnings to enhan...
Agents\n33\nSeptember 2024\nEnhancing model pe...
The three targeted learning approaches to enha...
None
The three targeted learning approaches to enha...
0
4.314925
0e661de4-636b-425d-8f6e-0a52b8070576
3dd0330a-6fac-49cd-bc32-98fc8b2bc009
1
What are the key functions of an agent's orche...
What are the key functions of an agent's orche...
implementation of the agent orchestration laye...
The orchestration layer of an agent is respons...
None
The key functions of an agent's orchestration ...
1
4.272081
3561c6fe-6ed4-4182-989a-270dcd635f32
210a2398-530f-4a7b-9c52-767396f73139
2
List up the name of the authors
List up the name of the authors
Agents\nAuthors: Julia Wiesinger, Patrick Marl...
The authors listed are Julia Wiesinger, Patric...
None
The authors are Julia Wiesinger, Patrick Marlo...
1
2.029024
b03e98d1-44ad-4142-8dfa-7b0a31a57096
06e580a5-5120-456a-91a5-d1b69a9a0868
3
What is Tree-of-thoughts?
What is Tree-of-thoughts?
weaknesses depending on the specific applicati...
Tree-of-thoughts (ToT) is a prompt engineering...
None
Tree-of-thoughts (ToT) is a prompt engineering...
1
3.765071
be18ec98-ab18-4f30-9205-e75f1cb70844
cd4a92d8-f2ea-447c-a18f-a0db533cb8cc
4
What is the framework used for reasoning and p...
What is the framework used for reasoning and p...
reasoning frameworks (CoT, ReAct, etc.) to \nf...
The frameworks used for reasoning and planning...
None
The frameworks used for reasoning and planning...
1
3.013066
eb4b29a7-511c-4f78-a08f-2d5afeb84320
fec108d9-97d5-4b2d-b0d3-c8e77158a999
5
How do agents differ from standalone language ...
How do agents differ from standalone language ...
1.\t Agents extend the capabilities of languag...
Agents differ from standalone language models ...
None
Agents can use tools to access real-time data ...
1
3.274887
f4a5a0cf-2d2e-4e15-838a-bc8296eb708b
80bc2b98-2026-416b-a588-d40a0b56770c
6
What are the three targeted learnings to enhan...
What are the three targeted learnings to enhan...
Agents\n33\nSeptember 2024\nEnhancing model pe...
The three targeted learnings to enhance model ...
None
The three targeted learning approaches to enha...
0
4.848947
0e661de4-636b-425d-8f6e-0a52b8070576
91caf834-e66c-4538-95d0-1f3009d19c74
7
What are the key functions of an agent's orche...
What are the key functions of an agent's orche...
implementation of the agent orchestration laye...
The key functions of an agent's orchestration ...
None
The key functions of an agent's orchestration ...
1
5.022591
3561c6fe-6ed4-4182-989a-270dcd635f32
ee18ccde-7acc-4afe-a1a8-06c7d3f258ff
8
List up the name of the authors
List up the name of the authors
Agents\nAuthors: Julia Wiesinger, Patrick Marl...
The authors are Julia Wiesinger, Patrick Marlo...
None
The authors are Julia Wiesinger, Patrick Marlo...
1
3.086064
b03e98d1-44ad-4142-8dfa-7b0a31a57096
eb8223b6-668f-4873-9234-50a09a514555
9
What is Tree-of-thoughts?
What is Tree-of-thoughts?
weaknesses depending on the specific applicati...
Tree-of-thoughts (ToT) is a prompt engineering...
None
Tree-of-thoughts (ToT) is a prompt engineering...
1
12.533168
be18ec98-ab18-4f30-9205-e75f1cb70844
2bc00521-a12a-4c0d-bacc-28b2f2fe8873
10
What is the framework used for reasoning and p...
What is the framework used for reasoning and p...
reasoning frameworks (CoT, ReAct, etc.) to \nf...
The frameworks used for reasoning and planning...
None
The frameworks used for reasoning and planning...
1
3.769949
eb4b29a7-511c-4f78-a08f-2d5afeb84320
33540ddf-876b-45f6-b78e-5c7db014bf3f
11
How do agents differ from standalone language ...
How do agents differ from standalone language ...
1.\t Agents extend the capabilities of languag...
Agents differ from standalone language models ...
None
Agents can use tools to access real-time data ...
1
3.677065
f4a5a0cf-2d2e-4e15-838a-bc8296eb708b
db404f5c-889c-4e68-9d76-7dc250506862
12
What are the three targeted learnings to enhan...
What are the three targeted learnings to enhan...
Agents\n33\nSeptember 2024\nEnhancing model pe...
The three targeted learnings to enhance model ...
None
The three targeted learning approaches to enha...
1
9.244867
0e661de4-636b-425d-8f6e-0a52b8070576
9729b15c-156c-4753-83b3-37a72eb090e7
13
What are the key functions of an agent's orche...
What are the key functions of an agent's orche...
implementation of the agent orchestration laye...
The key functions of an agent's orchestration ...
None
The key functions of an agent's orchestration ...
1
7.975982
3561c6fe-6ed4-4182-989a-270dcd635f32
75e6d19c-4532-4839-9947-2270b32b03d6
14
List up the name of the authors
List up the name of the authors
Agents\nAuthors: Julia Wiesinger, Patrick Marl...
The authors are Julia Wiesinger, Patrick Marlo...
None
The authors are Julia Wiesinger, Patrick Marlo...
1
12.666265
b03e98d1-44ad-4142-8dfa-7b0a31a57096
a46059e8-f848-4406-b332-2eab00171033
15
What is Tree-of-thoughts?
What is Tree-of-thoughts?
weaknesses depending on the specific applicati...
Tree-of-thoughts (ToT) is a prompt engineering...
None
Tree-of-thoughts (ToT) is a prompt engineering...
1
4.710261
be18ec98-ab18-4f30-9205-e75f1cb70844
4e3ce81f-f838-4614-bc5e-d32dbbb7bb23
16
What is the framework used for reasoning and p...
What is the framework used for reasoning and p...
reasoning frameworks (CoT, ReAct, etc.) to \nf...
The frameworks used for reasoning and planning...
None
The frameworks used for reasoning and planning...
1
4.156800
eb4b29a7-511c-4f78-a08f-2d5afeb84320
2a679b30-7588-44ed-bb0d-31cce4f91663
17
How do agents differ from standalone language ...
How do agents differ from standalone language ...
1.\t Agents extend the capabilities of languag...
Agents differ from standalone language models ...
None
Agents can use tools to access real-time data ...
1
2.865889
f4a5a0cf-2d2e-4e15-838a-bc8296eb708b
56826347-db40-4a16-a47f-d96d2abad4b2
This part focuses on performing repetitive evaluations of the RAG system using Ollama models. It illustrates the process of setting up and running multiple tests with Ollama, allowing for a comprehensive assessment of the RAG system's performance with these specific models.
0
What are the three targeted learnings to enhan...
What are the three targeted learnings to enhan...
Agents\n33\nSeptember 2024\nEnhancing model pe...
The three targeted learnings to enhance model ...
None
The three targeted learning approaches to enha...
0
48.045735
0e661de4-636b-425d-8f6e-0a52b8070576
16073b43-be8c-4ac3-8ab8-1fcea5881e37
1
What are the key functions of an agent's orche...
What are the key functions of an agent's orche...
implementation of the agent orchestration laye...
Based on the provided context, it appears that...
None
The key functions of an agent's orchestration ...
1
44.844708
3561c6fe-6ed4-4182-989a-270dcd635f32
36ba9035-a266-43bd-8317-2e5d716eaa5e
2
List up the name of the authors
List up the name of the authors
Agents\nAuthors: Julia Wiesinger, Patrick Marl...
The names of the authors are:\n\n1. Julia Wies...
None
The authors are Julia Wiesinger, Patrick Marlo...
1
42.542528
b03e98d1-44ad-4142-8dfa-7b0a31a57096
878fbb3e-c01f-47d7-aa6c-4d32804b81de
3
What is Tree-of-thoughts?
What is Tree-of-thoughts?
weaknesses depending on the specific applicati...
Tree-of-thoughts (ToT) is a prompt engineering...
None
Tree-of-thoughts (ToT) is a prompt engineering...
1
44.415462
be18ec98-ab18-4f30-9205-e75f1cb70844
312cf847-908c-4612-b3e3-86288c3757ea
4
What is the framework used for reasoning and p...
What is the framework used for reasoning and p...
reasoning frameworks (CoT, ReAct, etc.) to \nf...
Based on the provided context, it appears that...
None
The frameworks used for reasoning and planning...
1
49.577862
eb4b29a7-511c-4f78-a08f-2d5afeb84320
7dd6ec03-95b4-45a0-bb14-2630250018d8
5
How do agents differ from standalone language ...
How do agents differ from standalone language ...
1.\t Agents extend the capabilities of languag...
According to the retrieved context, agents and...
None
Agents can use tools to access real-time data ...
1
53.767911
f4a5a0cf-2d2e-4e15-838a-bc8296eb708b
d7d09ab0-a8f2-42ad-9842-a99758df77e0
6
What are the three targeted learnings to enhan...
What are the three targeted learnings to enhan...
Agents\n33\nSeptember 2024\nEnhancing model pe...
In-context learning and fine-tuning-based lear...
None
The three targeted learning approaches to enha...
0
43.936210
0e661de4-636b-425d-8f6e-0a52b8070576
820d770a-c690-472e-8749-c453e761084e
7
What are the key functions of an agent's orche...
What are the key functions of an agent's orche...
implementation of the agent orchestration laye...
The key functions of an agent's orchestration ...
None
The key functions of an agent's orchestration ...
1
50.533822
3561c6fe-6ed4-4182-989a-270dcd635f32
54a701fa-b9ad-4a5f-bdb9-1fad1251e0a8
8
List up the name of the authors
List up the name of the authors
Agents\nAuthors: Julia Wiesinger, Patrick Marl...
The names of the authors are:\n\n1. Julia Wies...
None
The authors are Julia Wiesinger, Patrick Marlo...
1
44.877717
b03e98d1-44ad-4142-8dfa-7b0a31a57096
77fa15e6-774a-44cd-a60f-f4b27e1da713
9
What is Tree-of-thoughts?
What is Tree-of-thoughts?
weaknesses depending on the specific applicati...
Tree-of-thoughts (ToT) is a prompt engineering...
None
Tree-of-thoughts (ToT) is a prompt engineering...
1
49.692480
be18ec98-ab18-4f30-9205-e75f1cb70844
9f228641-1476-4e17-84f9-0d2c3de33fb6
10
What is the framework used for reasoning and p...
What is the framework used for reasoning and p...
reasoning frameworks (CoT, ReAct, etc.) to \nf...
The answer to the question "What is the framew...
None
The frameworks used for reasoning and planning...
1
57.079942
eb4b29a7-511c-4f78-a08f-2d5afeb84320
bf4f9953-6eaa-467d-86ba-9c94f529e6d2
11
How do agents differ from standalone language ...
How do agents differ from standalone language ...
1.\t Agents extend the capabilities of languag...
According to the retrieved context, agents dif...
None
Agents can use tools to access real-time data ...
1
48.946233
f4a5a0cf-2d2e-4e15-838a-bc8296eb708b
cbfe2610-a4b7-4137-84ca-45dd42f83b48
12
What are the three targeted learnings to enhan...
What are the three targeted learnings to enhan...
Agents\n33\nSeptember 2024\nEnhancing model pe...
The text doesn't explicitly mention "targeted ...
None
The three targeted learning approaches to enha...
1
48.183349
0e661de4-636b-425d-8f6e-0a52b8070576
2672a1f0-b0af-43b8-891a-eae188cde04f
13
What are the key functions of an agent's orche...
What are the key functions of an agent's orche...
implementation of the agent orchestration laye...
Based on the provided context, the orchestrati...
None
The key functions of an agent's orchestration ...
1
54.076100
3561c6fe-6ed4-4182-989a-270dcd635f32
4302a894-cb5c-4e29-8844-daa3d6a9ba94
14
List up the name of the authors
List up the name of the authors
Agents\nAuthors: Julia Wiesinger, Patrick Marl...
The names of the authors are:\n\n1. Julia Wies...
None
The authors are Julia Wiesinger, Patrick Marlo...
1
45.883568
b03e98d1-44ad-4142-8dfa-7b0a31a57096
f03fd939-0d5d-4386-b1e0-ad6e77e9985f
15
What is Tree-of-thoughts?
What is Tree-of-thoughts?
weaknesses depending on the specific applicati...
Tree-of-thoughts (ToT) is a prompt engineering...
None
Tree-of-thoughts (ToT) is a prompt engineering...
1
52.200453
be18ec98-ab18-4f30-9205-e75f1cb70844
5cc65ad6-865f-4781-8054-e9159fb46d1b
16
What is the framework used for reasoning and p...
What is the framework used for reasoning and p...
reasoning frameworks (CoT, ReAct, etc.) to \nf...
Based on the provided context, it appears that...
None
The frameworks used for reasoning and planning...
0
57.564192
eb4b29a7-511c-4f78-a08f-2d5afeb84320
72b6ef7e-fe17-4d47-aaf4-4ea37299b2b4
17
How do agents differ from standalone language ...
How do agents differ from standalone language ...
1.\t Agents extend the capabilities of languag...
Based on the provided context, according to th...
None
Agents can use tools to access real-time data ...
1
52.182042
f4a5a0cf-2d2e-4e15-838a-bc8296eb708b
c3167606-9f4a-4971-a1e2-5fadc56e2afb