-
SemEval-2010 Task 5: Automatic Keyphrase Extraction from Scientific Articles
SemEval-2010 Task 5: Automatic Keyphrase Extraction from Scientific Articles -
YouTube Clickbait Detection Dataset
The dataset is a collection of online videos from YouTube, with comments and metadata. It is used to evaluate the performance of the Online Video Clickbait Protector (OVCP) scheme. -
A citation-based method for automatic indexing of Chinese academic literatures
The dataset used in this paper for citation-based method for automatic indexing of Chinese academic literatures. -
WINGNUS: Keyphrase extraction utilizing document logical structure
The dataset used in this paper for keyphrase extraction utilizing document logical structure. -
SemEval-2010 Task 5 dataset
The dataset used in this paper for keyphrase extraction from academic articles. -
Leveraging Passage Embeddings for Efficient Listwise Reranking
Passage ranking, which aims to rank each passage in a large corpus according to its relevance to the user's information need expressed in a short query. -
WikipassageQA, InsuranceQA v2, and MS-MARCO
The dataset contains three passage-ranking datasets: WikipassageQA, InsuranceQA v2, and MS-MARCO. -
PASSAGE RANKING WITH WEAK SUPERVISION
In this paper, we propose a weak supervision framework for neural ranking tasks based on the data programming paradigm (Ratner et al., 2016), which enables us to leverage... -
Web2Text: Deep Structured Boilerplate Removal
Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is... -
Wikipedia Corpus
The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,... -
Wikipedia dataset
The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries... -
GLOW : Global Weighted Self-Attention Network for Web Search
GLOW is a novel Global Weighted Self-Attention Network for web document search. It leverages global corpus statistics into the deep matching model. -
KPEVAL: Towards Fine-Grained Semantic-Based Keyphrase Evaluation
A comprehensive evaluation framework for keyphrase systems, including reference agreement, faithfulness, diversity, and utility.