You're currently viewing an old version of this dataset. To see the current version, click here.

MEREQ: Sample-Efficient Alignment from Human Intervention

Aligning robot behavior with human preferences is crucial for deploying embodied AI agents in human-centered environments. A promising solution is interactive imitation learning from human intervention, where a human expert observes the policy’s execution and provides interventions as feedback. How-ever, existing methods often fail to utilize the prior policy efficiently to facilitate learning, thus hindering sample efficiency. In this work, we introduce MEREQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning)1, designed for sample-efficient alignment from human intervention.

Data and Resources

Cite this as

Yuxin Chen, Chen Tang, Chenran Li, Ran Tian, Peter Stone, Masayoshi Tomizuka, Wei Zhan (2024). Dataset: MEREQ: Sample-Efficient Alignment from Human Intervention. https://doi.org/10.57702/80jk4f4b

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Author Yuxin Chen
More Authors
Chen Tang
Chenran Li
Ran Tian
Peter Stone
Masayoshi Tomizuka
Wei Zhan
Homepage https://sites.google.com/view/mereq/home