A repo built for the purpose of benchmarking the performance of agents, regardless of how they are set up and how they work.