3 datasets found

Groups: Natural Language Processing Organizations: No Organization

Filter Results
  • APIBank

    APIBank is a comprehensive benchmark for tool-augmented LLMs, focusing on API calling, retrieving, and planning abilities.
  • APIBench

    APIBench is a comprehensive benchmark for tool-augmented LLMs, focusing on API calling, retrieving, and planning abilities.
  • GTA: A Benchmark for General Tool Agents

    GTA is a benchmark for General Tool Agents, featuring three main aspects: real user queries, real deployed tools, and real multimodal inputs.