Semantic Textual Similarity - Groups

Linguistically Conditioned Semantic Textual Similarity

Semantic textual similarity (STS) is a fundamental NLP task that measures the semantic similarity between a pair of sentences. In order to reduce the inherent ambiguity posed...

Dataset
JSON

Divar Dataset

A dataset for measuring the domain similarity of Persian texts, generated from a dataset of advertisements posted on Divar application.

Dataset
JSON

SemEval-2024 task 1: Semantic textual relatedness for African and Asian langu...

A collection of semantic textual relatedness datasets for African and Asian languages.

Dataset
JSON

SemRel2024

A collection of semantic textual relatedness datasets for 14 languages.

Dataset
JSON

Self-StrAE at SemEval-2024 Task 1: Making Self-Structuring AutoEncoders Learn...

Self-StrAE is a model that processes a given sentence to generate both multi-level embeddings and a structure over the input.

Dataset
JSON

STS12, STS13, STS14, STS15, STS16, STSb, SICK-R

The STS12, STS13, STS14, STS15, STS16, STSb, SICK-R datasets contain sentence pairs from various sources.

Dataset
JSON

SICK-Relatedness

The SICK-Relatedness dataset contains 1,000 sentence pairs from the categories of captions, news, and forums.

Dataset
JSON

STS benchmark

The STS benchmark dataset contains 8,628 sentence pairs from the categories of captions, news, and forums.

Dataset
JSON

8 datasets found