Natural Language Processing - Groups

Mixtral of Experts

The dataset used in the paper for instruction following task
- Dataset
- JSON
speechocean762

speechocean762: An open-source non-native English speech corpus for pronunciation assessment.
- Dataset
- JSON
Automatic Pronunciation Assessment

A hierarchical context-aware modeling approach for multi-aspect and multi-granular pronunciation assessment
- Dataset
- JSON
Experimental Results

The authors evaluate the performance of their proposed conformal prediction methods for multistep feedback covariate shift (MFCS) on synthetic black-box optimization and active...
- Dataset
- JSON
The Online Pivot: Lessons Learned from Teaching a Text and Data Mining Course...

A text and data mining course on Natural Language Processing, adapted for online teaching during the COVID-19 pandemic.
- Dataset
- JSON
WikiSQL

Semantic parsing maps a user-issued natural language (NL) utterance to a machine-executable meaning representation (MR), such as λ−calculus (Zettlemoyer and Collins, 2005), SQL...
- Dataset
- JSON
Hearst

The dataset used in this paper is the Hearst dataset, which is a collection of text documents.
- Dataset
- JSON
WordNet Noun

The dataset used in this paper is the WordNet Noun dataset, which is a collection of nouns with their semantic relationships.
- Dataset
- JSON
Universal Conceptual Cognitive Annotation (UCCA)

The Universal Conceptual Cognitive Annotation (UCCA) dataset is a graph-based semantic annotation scheme based on typological linguistic principles.
- Dataset
- JSON
Russian Noun Dataset

The dataset used for clustering contains the 2000 most frequent nouns in the Russian Web corpus.
- Dataset
- JSON
Spanish Noun Dataset

The dataset used for clustering contains the 2000 most frequent nouns in the Spanish Gigaword corpus.
- Dataset
- JSON
English Noun Dataset

The dataset used for clustering contains the 2000 most frequent nouns in the British National Corpus (BNC) and the English Gigaword corpus.
- Dataset
- JSON
Toward an Architecture for Never-ending Language Learning

Toward an Architecture for Never-ending Language Learning.
- Dataset
- JSON
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

Fine-grained analysis of sentence embeddings using auxiliary prediction tasks.
- Dataset
- JSON
LIMP Dataset

The dataset used in the paper is a set of 35 complex and ambiguous object goal navigation and mobile pick-and-place instructions.
- Dataset
- JSON
NAVER Open Podium and NAVER Encyclopedia

A large dataset of Korean text.
- Dataset
- JSON
Sanskrit ASR dataset

A dataset for Sanskrit ASR
- Dataset
- JSON
वाक् सञ्चयः (/Vāksañcayah ̣/)

A new Sanskrit speech corpus and a large-vocabulary ASR system for Sanskrit
- Dataset
- JSON
DuReader

DuReader dataset is a Chinese machine reading comprehension dataset, focusing on real-world web data
- Dataset
- JSON
HONEST

HONEST is a fairness dataset specifically designed to assess LMs' outputs' hurtfulness.
- Dataset
- JSON

420 datasets found