Agent Annotation Benchmark

Annotation benchmarks are benchmarks created by annotating (providing feedback to) runs of agents. They can be used directly in agent optimization (configs, structure). For a detailed example of how to run agents in a simulated environment and how to use annotation benchmarks in agent optimization, see summarization-agent (simulate→annotate→optimize)-part-1.py and summarization-agent (simulate→annotate→optimize)-part-2.py.

Create Annotation Benchmark

  1. To create an annotation benchmark, first go to RELAI platform and find Run under Results.

    RELAI platform->Results->Run

  2. Click on individual runs to inspect any agent you executed in a simulated environment.

    Inspect agent runs.

  3. Annotate the runs with the Like/Dislike, Desired Output, Feedback fields and save your changes.

    Annotate agent runs.

  4. Use the "Add to Benchmark" button at the bottom to add the annotated run as a sample to the benchmark of your choice. (Use the Create a new annoatation benchmark function if you have not created any benchmark yet)

    Add the annotated run to a benchmark.

  5. Continue to annotate and add other runs to the benchmark. The benchmark is already ready-to-use with its benchmark id. See summarization-agent (simulate→annotate→optimize)-part-1.py and summarization-agent (simulate→annotate→optimize)-part-2.py for how to use annotation benchmarks in agent optimization.