Dataset - LDM

SemEval

The dataset used for stance detection on social media, incorporating moral foundations.
- Dataset
- JSON
MSR

The MSR dataset is a widely used vulnerability detection dataset, consisting of 10,900 vulnerable examples and 177,736 non-vulnerable examples.
- Dataset
- JSON
LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction

LNMap: Departures from isomorphic assumption in bilingual lexicon induction through non-linear mapping in latent space.
- Dataset
- JSON
Learning Principled Bilingual Word Embeddings

Learning principled bilingual mappings of word embeddings while preserving monolingual invariance.
- Dataset
- JSON
RAPO: An Adaptive Ranking Paradigm for Bilingual Lexicon Induction

Bilingual lexicon induction induces the word translations by aligning independently trained word embeddings in two languages.
- Dataset
- JSON
Exponential Family Embeddings

Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. In this paper, we develop exponential family embeddings, a class of...
- Dataset
- JSON
Intrinsic evaluations of word embeddings: What can we do better?

This dataset has no description
- Dataset
- JSON
Problems with evaluation of word embeddings using word similarity tasks

This dataset has no description
- Dataset
- JSON
Improving zero-shot learning by mitigating the hubness problem

This dataset has no description
- Dataset
- JSON
Distributed representations of words and phrases and their compositionality

The word2vec dataset is a word embedding dataset that contains 3 million words.
- Dataset
- JSON
Improving distributional similarity with lessons learned from word embeddings

This dataset has no description
- Dataset
- JSON
Learning Word Embeddings from the Portuguese Twitter Stream: A Study of some ...

This paper describes a preliminary study for producing and distributing a large-scale database of embeddings from the Portuguese Twitter stream.
- Dataset
- JSON
Word2Vec Dataset

The Word2Vec dataset.
- Dataset
- JSON
Massively Multilingual Word Embeddings

Massively multilingual word embeddings.
- Dataset
- JSON
Learning Sentiment-Specific Word Embeddings from Distant Supervision

Sentiment-specific word embeddings dataset
- Dataset
- JSON
Wikipedia2Vec dataset

The dataset used in the paper is the Wikipedia2Vec dataset, which contains word embeddings.
- Dataset
- JSON
Scientific Articles Corpus

The dataset used in this research is a large-scale academic corpus containing titles and abstracts of approximately 70 million scientific articles.
- Dataset
- JSON
FastText

The FastText dataset is a subword token embedding model. It produces a vector representation of a word based on composing embeddings of the character n-grams composing the word.
- Dataset
- JSON
Polyglot Wikipedia

The dataset used for training and testing the MVLSA model.
- Dataset
- JSON
Word2Vec

Bilingual word embeddings from parallel and non-parallel corpora for cross-language text classification
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

21 datasets found