Natural Language Processing - Groups

EC AI Platform

The dataset used in the paper is not explicitly described, but it is mentioned that the authors evaluated GPT-4 against three applications built with the EC AI platform for...

Dataset
JSON

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algori...

Six-bit quantization can effectively reduce the size of large language models and preserve the model quality consistently across varied applications.

Dataset
JSON

LDC2014T12

The dataset used in the paper is the Linguistic Data Consortium AMR corpus release 1.0 (LDC2014T12), consisting of 13,050 AMR/English sentence pairs.

Dataset
JSON

Exponential Family Embeddings

Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. In this paper, we develop exponential family embeddings, a class of...

Dataset
JSON

Towards Improving Selective Prediction Ability of NLP Systems

SNLI, MNLI, Stress Test, Matched Mismatched, Competence, Distraction, and Noise datasets

Dataset
JSON

Neural Language Correction with Character-Based Attention

Neural language correction with character-based attention.

Dataset
JSON

Stanford Neural Machine Translation Systems for Spoken Language Domain

Stanford neural machine translation systems for spoken language domain.

Dataset
JSON

Corpora Generation for Grammatical Error Correction

Two approaches for generating large parallel datasets for Grammatical Error Correction (GEC) using publicly available Wikipedia data.

Dataset
JSON

MNLI, QQP, and SST-2

The dataset used in this paper consists of three tasks: Multi-Genre Natural Language Inference (MNLI), Quora Question Pairs (QQP), and Stanford Sentiment Treebank (SST-2).

Dataset
JSON

Are Larger Pretrained Language Models Uniformly Better? Comparing Performance...

Larger language models have higher accu- racy on average, but are they better on ev- ery single instance (datapoint)?

Dataset
JSON

Learning to summarize with human feedback

The paper presents a study on the impact of synthetic data on large language models (LLMs) and proposes a method to steer LLMs towards desirable non-differentiable attributes.

Dataset
JSON

Reward Model Ensembles

The authors used three datasets: TL;DR, HELPFULNESS, and XSUM/NLI.

Dataset
JSON

STAMP 4 NLP

STAMP 4 NLP is an instantiable, iterative, and incremental process model for developing natural language processing applications with a focus on quality, business value, and...

Dataset
JSON

Detecting Hallucinated Content in Conditional Neural Sequence Generation

Neural sequence models can generate highly fluent sentences, but recent studies have also shown that they are also prone to hallucinate additional content not supported by the...

Dataset
JSON

A general theoretical paradigm to understand learning from human preferences

The paper proposes a novel approach to aligning language models with human preferences, focusing on the use of preference optimization in reward-free RLHF.