MEREQ: Sample-Efficient Alignment from Human Intervention

Aligning robot behavior with human preferences is crucial for deploying embodied AI agents in human-centered environments. A promising solution is interactive imitation learning from human intervention, where a human expert observes the policy’s execution and provides interventions as feedback. How-ever, existing methods often fail to utilize the prior policy efficiently to facilitate learning, thus hindering sample efficiency. In this work, we introduce MEREQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning)1, designed for sample-efficient alignment from human intervention.

BibTex: