3 datasets found

Tags: Natural Language Models

Filter Results
  • APIBank

    APIBank is a comprehensive benchmark for tool-augmented LLMs, focusing on API calling, retrieving, and planning abilities.
  • APIBench

    APIBench is a comprehensive benchmark for tool-augmented LLMs, focusing on API calling, retrieving, and planning abilities.
  • GTA: A Benchmark for General Tool Agents

    GTA is a benchmark for General Tool Agents, featuring three main aspects: real user queries, real deployed tools, and real multimodal inputs.
You can also access this registry using the API (see API Docs).