No Organization - Organizations

AQUAINT

The AQUAINT dataset is used for evaluating named entity disambiguation performance.

Dataset
JSON

MSNBC

The MSNBC dataset is used for evaluating named entity disambiguation performance.

Dataset
JSON

AIDA-CoNLL

The AIDA-CoNLL dataset consists of annotated entities in a large corpus for named entity disambiguation tasks.

Dataset
JSON

FQuAD: French Question Answering Dataset

The French Question Answering Dataset (FQuAD) is a native Reading Comprehension dataset comprising questions and answers extracted from Wikipedia articles. It aims to provide a...

Dataset
JSON

English Web Treebank

The English Web Treebank is part of the Universal Dependencies framework and serves as a syntactically and semantically annotated corpus for training and evaluating dependency...

Dataset
JSON

WIKIHOP

WIKIHOP is a dataset constructed to require multi-hop reasoning over multiple Wikipedia paragraphs while answering entity-relation questions.

Dataset
JSON

CWQ

CWQ (Complex Web Questions) is a dataset involving complex web-based questions requiring multiple steps to answer.

Dataset
JSON

CQ

CQ (Complex Questions) consists of complex queries from Google and is designed for answering questions from a knowledge base.

Dataset
JSON

SEARCHQA

SEARCHQA is a dataset designed for reading comprehension, containing trivia questions and web snippets retrieved through Google.

Dataset
JSON

CMU-MOSEI

CMU-MOSEI is a dataset for multimodal sentiment analysis with sentiment annotations at the sentence level, featuring a blend of audio-visual and textual data.

Dataset
JSON

CMU-MOSI

CMU-MOSI (CMU Multimodal Opinion Sentiment Intensity) is a dataset of multimodal language focused on multimodal sentiment analysis, containing 2199 video segments from 93...

Dataset
JSON

EmoContext Dataset

The EmoContext task dataset consists of conversations extracted from social media, intended for emotion detection, annotated with four main emotions: Happy, Sad, Angry, and...

Dataset
JSON

ClueWeb09-B

ClueWeb includes documents from ClueWeb09-B and queries from the TREC Web Track ad hoc retrieval task 2009-2012. The dataset consists of 200 queries with relevance judgements...

Dataset
JSON

LINNAEUS Dataset

The LINNAEUS dataset is a system for species name identification in biomedical literature.

Dataset
JSON

Species-800 Corpus

The Species-800 corpus is used for species name recognition in text.

Dataset
JSON

JNLPBA Corpus

The JNLPBA corpus serves as a benchmark forbio-entity recognition tasks.

Dataset
JSON

BioCreative V CDR Corpus

The BioCreative V CDR task corpus is a resource for chemical disease relation extraction.

Dataset
JSON

English-Finnish and English-Estonian Datasets

Monolingual English datasets consisting of backtranslated and parallel data used for training the translation models between English, Finnish, and Estonian.

Dataset
JSON

Finnish-Estonian Parallel Data

A bilingual corpus created by triangulating English–Finnish and English–Estonian parallel data, resulting in a set of 679,252 sentence pairs used to extract cognates and improve...

Dataset
JSON

WMT 2014 English-German Translation Dataset

The WMT 2014 English-German translation dataset consists of parallel sentences in English and German used to evaluate machine translation models.

Dataset
JSON

20,499 datasets found