Language learning - Groups

Expert pairs for communication game

The dataset used in the paper is a set of 30 Expert pairs trained on a communication game, with inputs and outputs of pairs of agents trained to convergence on a reconstruction...
- Dataset
- JSON
User Adaptive Language Learning Chatbot Dataset

The dataset used in the paper is a collection of conversations between a user-adaptive language learning chatbot and 155 8th-grade Chinese students. The chatbot is designed to...
- Dataset
- JSON
Lang-84

The dataset used in this paper is a collection of parallel sentence pairs from 96 different native languages, with at least 10,000 sentence pairs per language.
- Dataset
- JSON
Readers

The dataset is used to optimize the learning order of Chinese characters.
- Dataset
- JSON
HSK

The dataset is used to optimize the learning order of Chinese characters.
- Dataset
- JSON
UNIHAN

The dataset is used to optimize the learning order of Chinese characters.
- Dataset
- JSON
SUBTLEX-CH

The dataset is used to optimize the learning order of Chinese characters.
- Dataset
- JSON
ARC database

The ARC database contains 358,534 monosyllabic non-words.
- Dataset
- JSON
L1-speciﬁc Swedish non-words

This dataset contains Swedish non-words generated using a 4-gram Swedish language model.
- Dataset
- JSON
Cambridge First Certiﬁcate in English (FCE) dataset

The Cambridge First Certiﬁcate in English (FCE) dataset is used as the source of ESL data. The corpus is a subset of the Cambridge Learner Corpus (CLC) and contains English...
- Dataset
- JSON
Spanish

Spanish is a data set of 182 middle-school students practicing 409 Spanish exercises.
- Dataset
- JSON
sentencesBooks dataset

A collection of sentences from literature books (sentencesBooks), containing 56,557 labels and 2,400 total words.
- Dataset
- JSON
sentencesInternet dataset

A collection of sentences collected from the Internet (sentencesInternet), containing 85,941 labels and 4,800 total words.
- Dataset
- JSON
Literature de jeunesse libre (LjL) dataset

Literature de jeunesse libre (LjL) dataset, containing 334,026 labels and 2,060 total words.
- Dataset
- JSON
Experiments with GPT-3-based difficulty estimation

The dataset used for the experiments, containing three datasets: Literature de jeunesse libre (LjL), a collection of sentences collected from the Internet (sentencesInternet),...
- Dataset
- JSON
Mandarin tone learning dataset for Japanese learners

The dataset used in the paper is a Mandarin tone learning dataset for Japanese learners.
- Dataset
- JSON

16 datasets found