-
Latent Distance Guided Alignment Training for Large Language Models
Ensuring alignment with human preferences is a crucial characteristic of large language models (LLMs). Presently, the primary alignment methods, RLHF and DPO, require extensive... -
VisualBERT
The VisualBERT dataset is a pre-trained model for vision-and-language tasks, which is built on top of PyTorch. -
Task Driven Image Understanding Challenge (TDIUC)
The Task Driven Image Understanding Challenge (TDIUC) dataset is a large VQA dataset with 12 more fine-grained categories proposed to compensate for the bias in distribution of... -
ZESHEL dataset
The ZESHEL dataset was constructed by Logeswaran et al. (2019) from Wikia. The task of zero-shot entity linking involves linking entity mentions in text to an entity from a list... -
A general theoretical paradigm to understand learning from human preferences
The paper proposes a novel approach to aligning language models with human preferences, focusing on the use of preference optimization in reward-free RLHF. -
Simplifying graph convolutional networks
Simplifying graph convolutional networks. -
StackLLaMA: An RL fine-tuned LLaMA model for Stack Exchange question and answ...
The dataset used in the paper is the StackExchange dataset. -
ACE 2005, WebNLG, CoNLL, NYT, and FB15k-237
The dataset used in the paper is ACE 2005, WebNLG, CoNLL, NYT, and FB15k-237. The ACE 2005 dataset is a collection of news articles, while WebNLG is a corpus used for natural... -
Multimodal Visual Patterns (MMVP) Benchmark
The Multimodal Visual Patterns (MMVP) benchmark is a dataset used to evaluate the visual question answering capabilities of multimodal large language models (MLLMs). -
InstructBLIP
The InstructBLIP dataset is a vision-language model for comprehensive scene understanding and textual descriptions. -
Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of ...
Dysca is a dynamic and scalable benchmark for evaluating the perception ability of Large Vision-Language Models (LVLMs) via various subtasks and scenarios. -
Symbolic, Language Agnostic and Ontologically Grounded Large Language Models
The dataset used in the paper to demonstrate the limitations of large language models (LLMs) in capturing inferential aspects of natural language. -
A general language assistant as a laboratory for alignment
A general language assistant for aligning language models with human users -
SimpleQuestion
The SimpleQuestion dataset is a dataset for question answering, consisting of 100,000 questions and 1,000,000 answers.