-
ChatGPT model data
ChatGPT model data, used to generate text -
Adding A Filter Based on The Discriminator to Improve Unconditional Text Gene...
The dataset is used for unconditional text generation, and the authors propose a novel mechanism to improve the generator by adding a filter which has the same input as the... -
Textual Sports Commentary Dataset
The textual dataset is a collection of live sports commentaries scraped from various sources, including live score websites and YouTube. -
Sports Commentary Dataset
The dataset is a collection of live sports commentaries, including audio and textual data, used to train and evaluate machine learning models for event recognition and... -
CLEVR-Robot Environment
A benchmark for evaluating task compositionality and long-horizon tasks through object manipulation, with language serving as the mechanism for goal specification. -
PersonaChat dataset
The PersonaChat dataset is a large persona-conditioned chit-chat style dialogue dataset. -
OPT-66B and Llama2-70B
The dataset used in the paper is OPT-66B, a large language model, and Llama2-70B, another large language model. -
Mixtral of Experts
The dataset used in the paper for instruction following task -
speechocean762
speechocean762: An open-source non-native English speech corpus for pronunciation assessment. -
Automatic Pronunciation Assessment
A hierarchical context-aware modeling approach for multi-aspect and multi-granular pronunciation assessment -
Experimental Results
The authors evaluate the performance of their proposed conformal prediction methods for multistep feedback covariate shift (MFCS) on synthetic black-box optimization and active... -
The Online Pivot: Lessons Learned from Teaching a Text and Data Mining Course...
A text and data mining course on Natural Language Processing, adapted for online teaching during the COVID-19 pandemic. -
WordNet Noun
The dataset used in this paper is the WordNet Noun dataset, which is a collection of nouns with their semantic relationships. -
Universal Conceptual Cognitive Annotation (UCCA)
The Universal Conceptual Cognitive Annotation (UCCA) dataset is a graph-based semantic annotation scheme based on typological linguistic principles. -
Russian Noun Dataset
The dataset used for clustering contains the 2000 most frequent nouns in the Russian Web corpus.