No Organization - Organizations

ShiftT: Simulation-to-Human Instruction Following via Transfer from Text

The dataset used in the paper is a collection of natural human instructions for a 3D room containing everyday objects. The instructions are used to train an agent to follow...

Dataset
JSON

NLSC

NLSC uses two pre-trained models, one uses 19.7B words from 700 dimensions trained on English tweets, and the other uses 1.7B words from 700 dimensions trained on Wikipedia...

Dataset
JSON

Phi-2: A Dataset for Language Model Evaluation

The Phi-2 dataset is a collection of language models used to evaluate the performance of language models.

Dataset
JSON

STAMP 4 NLP

STAMP 4 NLP is an instantiable, iterative, and incremental process model for developing natural language processing applications with a focus on quality, business value, and...

Dataset
JSON

CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset

A comprehensive dataset for post-OCR parsing and receipt understanding, specifically designed to enhance OCR and information extraction from receipts in multilingual contexts...

Dataset
JSON

PRIVACY-PRESERVING IN-CONTEXT LEARNING FOR LARGE LANGUAGE MODELS

In-context learning (ICL) is an important capability of Large Language Models (LLMs), enabling these models to dynamically adapt based on specific, in-context exemplars, thereby...

Dataset
JSON

CIFAR-10, FEMNIST, and IMDB

The dataset used in the paper is CIFAR-10, FEMNIST, and IMDB. The authors used these datasets to evaluate the performance of the EmbracingFL framework.

Dataset
JSON

Virtual Language Observatory (VLO)

The Virtual Language Observatory (VLO) is a web application equipped with easy-to-use Natural Language Processing tools.

Dataset
JSON

MNLI: Multi-Genre Natural Language Inference

Propose a method for evaluating gender bias in contextualised word embeddings.

Dataset
JSON

SEAT: Sentence Encoder Association Test

Propose a method for evaluating gender bias in contextualised word embeddings.

Dataset
JSON

ARAGPT2

ARAGPT2 is a stacked transformer-decoder model trained using the causal language modeling objective. The model is trained on 77GB of Arabic text.

Dataset
JSON

Reducing Retraining by Recycling Parameter-Efﬁcient Prompts

Parameter-efﬁcient methods are able to use a single frozen pre-trained large language model to perform many tasks by learning task-speciﬁc soft prompts that modulate model...

Dataset
JSON

COMMUNITY-CROSS-INSTRUCT

COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities

Dataset
JSON

Global Neural CCG Parsing with Optimality Guarantees

Dataset
JSON

Grounded response generation task at DSTC7

Dataset
JSON

Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish T...

The Copenhagen Corpus of eye tracking recordings from natural reading of Danish texts

Dataset
JSON

SQuAD: 100,000+ Questions for Machine Comprehension of Text

The SQuAD dataset is a benchmark for natural language understanding tasks, including question answering and text classification.

Dataset
JSON

FarFetched: Entity-centric Reasoning and Claim Validation for the Greek Language

FarFetched is a modular framework that enables people to verify any kind of textual claim based on the incorporated evidence from textual news sources.

Dataset
JSON

Stock Movement and Volatility Prediction from Tweets, Macroeconomic Factors a...

The dataset used in the paper for stock movement and volatility prediction from tweets, macroeconomic factors and historical prices.

Dataset
JSON

Web2Text: Deep Structured Boilerplate Removal

Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is...

Dataset
JSON

530 datasets found