ORKG Properties and LLM-Generated Research Dimensions Evaluation Dataset

This dataset contains a collection of 103 research comparisons from the Open Research Knowledge Graph (ORKG) with annotated properties and corresponding research dimensions generated by three different Large Language Models (LLMs). The dataset includes 1,317 papers from 35 diverse research fields, addressing 153 distinct research problems. Each paper is associated with human-annotated ORKG properties, as well as research dimensions generated by GPT-3.5, Llama 2, and Mistral LLMs. The dataset provides a comprehensive evaluation benchmark for comparing the performance of different LLMs in generating research dimensions that align with human-annotated properties.

Dataset columns:

  • comparison_id: Unique identifier of the research comparison in the Open Research Knowledge Graph (ORKG)
  • contribution_id: Identifier of the individual research contribution (paper) within a comparison
  • paper_id: Unique identifier of the research paper
  • paper_title: Title of the research paper
  • research_field: Field of research associated with the paper
  • research_problem: Specific research problem addressed by the paper
  • orkg_properties: Human-annotated properties of the paper in the ORKG, representing specific attributes or characteristics of the research contribution
  • gpt_dimensions: Research dimensions generated by the GPT Large Language Model (LLM) for the paper
  • mistral_dimensions: Research dimensions generated by the Mistral LLM for the paper
  • llama2_dimensions: Research dimensions generated by the Llama2 LLM for the paper
  • mappings: Mapping of ORKG properties to LLM-generated research dimensions
  • alignments: Alignment scores between ORKG properties and LLM-generated research dimensions
  • deviations: Deviation scores between ORKG properties and LLM-generated research dimensions
  • orkg_gpt_similarity: Cosine similarity score between the embeddings of ORKG properties and GPT-generated research dimensions
  • orkg_llama2_similarity: Cosine similarity score between the embeddings of ORKG properties and Llama2-generated research dimensions
  • orkg_mistral_similarity: Cosine similarity score between the embeddings of ORKG properties and Mistral-generated research dimensions
  • gpt_llama2_similarity: Cosine similarity score between the embeddings of GPT-generated and Llama2-generated research dimensions
  • gpt_mistral_similarity: Cosine similarity score between the embeddings of GPT-generated and Mistral-generated research dimensions
  • llama2_mistral_similarity: Cosine similarity score between the embeddings of Llama2-generated and Mistral-generated research dimensions

Data and Resources

Cite this as

Nechakhin, Vladyslav (2024). Dataset: ORKG Properties and LLM-Generated Research Dimensions Evaluation Dataset. https://doi.org/10.25835/6oyn9d1n

DOI retrieved: April 29, 2024

Additional Info

Field Value
Imported on November 28, 2024
Last update November 28, 2024
License CC-BY-SA-3.0
Source https://data.uni-hannover.de/dataset/orkg-properties-and-llm-generated-research-dimensions-evaluation-dataset
Author Nechakhin, Vladyslav
Given Name Vladyslav
Family Name Nechakhin
Maintainer Vladyslav Nechakhin
Source Creation 29 April, 2024, 09:36 AM (UTC+0000)
Source Modified 29 April, 2024, 09:58 AM (UTC+0000)