The SIMMC 2.0 dataset is a collection of multi-modal task-oriented dialogues, where both the system and the agent are situated in the same virtual environment.
DialCLIP is a parameter-efficient prompt-tuning method for multi-modal dialog retrieval. It introduces a multi-modal context prompt generator to learn context features which are...