-
Linguistically Conditioned Semantic Textual Similarity
Semantic textual similarity (STS) is a fundamental NLP task that measures the semantic similarity between a pair of sentences. In order to reduce the inherent ambiguity posed... -
Divar Dataset
A dataset for measuring the domain similarity of Persian texts, generated from a dataset of advertisements posted on Divar application. -
SemEval-2024 task 1: Semantic textual relatedness for African and Asian langu...
A collection of semantic textual relatedness datasets for African and Asian languages. -
SemRel2024
A collection of semantic textual relatedness datasets for 14 languages. -
Self-StrAE at SemEval-2024 Task 1: Making Self-Structuring AutoEncoders Learn...
Self-StrAE is a model that processes a given sentence to generate both multi-level embeddings and a structure over the input. -
STS12, STS13, STS14, STS15, STS16, STSb, SICK-R
The STS12, STS13, STS14, STS15, STS16, STSb, SICK-R datasets contain sentence pairs from various sources. -
SICK-Relatedness
The SICK-Relatedness dataset contains 1,000 sentence pairs from the categories of captions, news, and forums. -
STS benchmark
The STS benchmark dataset contains 8,628 sentence pairs from the categories of captions, news, and forums.