-
LLM dataset
The dataset used in this paper is not explicitly described, but it is mentioned that it is a large language model (LLM) and that the authors used it to train and evaluate their... -
ÆTHEL: Automatically Extracted Typelogical Derivations for Dutch
A semantic compositionality dataset for written Dutch, consisting of a lexicon of supertags for about 900,000 words in context and 72,192 validated derivations. -
Utilizing Prolog for converting between active and passive sentence with thre...
This work introduces a simple but efficient method to solve one of the critical aspects of English grammar, the relationship between active sentence and passive sentence. -
Universal Dependencies (UD) treebanks
The dataset used in the paper is not explicitly mentioned, but it is mentioned that the authors used the Universal Dependencies (UD) treebanks. -
Reddit Comments dataset
The Reddit Comments dataset is constructed from publicly available user comments on submissions on the Reddit website. -
Open Subtitles dataset
The Open Subtitles dataset consists of transcriptions of spoken dialog in movies and television shows. -
Attacker and Defender Counting Approach for Abstract Argumentation
The dataset is used to evaluate arguments by counting the number of attackers and defenders for each argument. -
ANY dataset
ANY dataset combines natural and synthetic data, used to probe polarity via negative polarity items (NPIs) in two pre-trained Transformer-based models (BERT and GPT-2). -
English-Hindi Parallel Corpus
The dataset used for training and testing the machine translation systems. -
English-Hindi Outputs Quality Estimation using Naive Bayes Classifier
The dataset used for training and testing the Naive Bayes classifier for quality estimation of English-Hindi outputs. -
Gemma: Open models based on gemini research and technology
This dataset contains a large corpus of text for training and evaluating large language models. -
Llama 2: Open foundation and fine-tuned chat models
This dataset contains a large corpus of text for training and evaluating large language models. -
harmless/harmful anchor datasets
This dataset contains 100 harmless and 100 harmful anchor prompts for evaluating the performance of large language models. -
Decimal Addition Dataset
The dataset used in this paper is a collection of decimal addition tasks, where the input lengths range from 1 to 40 digits. The dataset is used to evaluate the ability of... -
UzSyllable dataset
A comprehensive dataset for evaluating and training machine learning algorithms for syllable prediction accuracy and performance.