-
Adding Conditional Control to Diffusion Models with Reinforcement Learning
Diffusion models are powerful generative models that allow for precise control over the characteristics of the generated samples. While these diffusion models trained on large... -
StarCraft II with Human Expertise in Subgoals Selection
StarCraft II minigames dataset used for hierarchical reinforcement learning with human expertise in subgoal selection -
Atari Learning Environment
The dataset used in this paper is the Atari Learning Environment (ALE) dataset, which consists of 15 Atari video games. -
OpenAI Gym
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used several continuous control environments from the OpenAI Gym. -
Arcade Learning Environment (ALE)
The dataset used in the paper is the Arcade Learning Environment (ALE) dataset, which includes an ATARI 2600 emulator and about 50 games. -
Gridworld domain
The dataset used in the paper is a simple gridworld domain with pixel-based states. -
Event-based Visuomotor Policies
Event-based camera data used for learning event-based visuomotor policies -
Distributional Multivariate Policy Evaluation and Exploration with the Bellma...
The dataset is used to evaluate the distributional approach to reinforcement learning (DiRL) and its equivalence to Generative Adversarial Networks (GANs). -
Off-Policy Deep Reinforcement Learning without Exploration
The dataset used in the paper is a batch of data collected from a fixed batch of data which has already been gathered, without offering further possibility for data collection. -
Learning to Charge RF-Energy Harvesting Devices in WiFi Networks
The dataset used in this paper is a simulation dataset for RF-energy harvesting devices in WiFi networks. -
SimpleQuestion dataset for Wikidata
The dataset used in this paper is a reinforcement learning dataset, specifically the SimpleQuestion dataset, which contains questions answerable using Wikidata as the knowledge... -
Toxic-DPO Dataset
The dataset used in the paper is the Toxic-DPO dataset, which is used for reinforcement learning from human feedback. -
Anthropic-HH-RLHF Dataset
The dataset used in the paper is the Anthropic-HH-RLHF dataset, which is used for reinforcement learning from human feedback. -
3-Dots Dataset
The 3-dots dataset is a variation of the moving dot dataset with three dots on the three channels of the image. -
Moving Dot Dataset
The dataset is a simple environment with a moving dot inside a square. The dot cannot leave the square, and is always visible on the screen. The goal is to learn a...