Agent Annotation Benchmark
Annotation benchmarks are benchmarks created by annotating (providing feedback to) runs of agents. They can be used directly in agent optimization (configs, structure). For a detailed example of how to run agents in a simulated environment and how to use annotation benchmarks in agent optimization, see summarization-agent (simulate→annotate→optimize)-part-1.py and summarization-agent (simulate→annotate→optimize)-part-2.py.
Create Annotation Benchmark
-
To create an annotation benchmark, first go to RELAI platform and find Run under Results.
-
Click on individual runs to inspect any agent you executed in a simulated environment.

-
Annotate the runs with the
Like/Dislike,Desired Output,Feedbackfields and save your changes.
-
Use the "Add to Benchmark" button at the bottom to add the annotated run as a sample to the benchmark of your choice. (Use the
Create a new annoatation benchmarkfunction if you have not created any benchmark yet)
-
Continue to annotate and add other runs to the benchmark. The benchmark is already ready-to-use with its benchmark id. See summarization-agent (simulate→annotate→optimize)-part-1.py and summarization-agent (simulate→annotate→optimize)-part-2.py for how to use annotation benchmarks in agent optimization.