Our paper on embodied captioning is accepted by ICCV 2023!

Our paper on text-to-video is accepted by ACM Multimedia 2023!

Our papers on object-goal navigation and sim-to-real transfer are accepted by IROS 2023!

One paper accepted by CVPR 2023.

The HM3DAutoVLN paper is accepted to ECCV 2022 and also achieves the 2nd place in REVERIE Challenge.

Our team wins the REVERIE Challenge at ICCV 2021 HIRV Workshop.

HAMT accepted by NeurIPS 2021.

Two papers accepted by ICCV 2021.

Two papers (1 oral and 1 poster) accepted by ACM Multimedia 2021.

Two papers (1 oral and 1 poster) accepted by CVPR 2021.


We are organizing YouMakeup VQA Challenge at CVPR 2020 for fine-grained action understanding!

Two papers (1 oral and 1 poster) accepted by CVPR 2020.

Our team is the winner of NIST Trecvid 2019 Video to Text Task.

Our team achieves the second place in ICCV 2019 VATEX Video Captioning Challenge.

We have three papers accepted by ACM Multimedia 2019!

Our paper "YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension" has been accepted by EMNLP 2019 and the dataset is released.
[PDF] [Data]


Our paper "From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots" has been accepted by IJCAI 2019.

Our paper "Unsupervised Bilingual Lexicon Induction from Mono-lingual Multimodal Data" has been accepted by AAAI 2019.

Selected Publications

History Aware Multimodal Transformer for Vision-and-Language Navigation
Shizhe Chen, Pierre-Louis Guhur, Cordelia Schmid, Ivan Laptev.
ICCV, 2021. [Project] [PDF] [Code]

Elaborative Rehearsal for Zero-shot Action Recognition
Shizhe Chen, Dong Huang.
ICCV, 2021. [PDF] [Code]

Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur, Makarand Tapaswi, Shizhe Chen, Ivan Laptev, Cordelia Schmid.
ICCV, 2021. [Project] [PDF] [Code]

Question-controlled Text-aware Image Captioning
Anwen Hu, Shizhe Chen, Qin Jin.
ACM Multimedia, 2021. [PDF]

Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
Yuqing Song, Shizhe Chen, Qin Jin, Wei Luo, Jun Xie, Fei Huang.
ACM Multimedia, 2021 (Oral). [PDF]

Sketch, Ground, and Refine: Top-Down Dense Video Captioning
Chaorui Deng, Shizhe Chen, Da Chen, Yuan He, Qi Wu.
CVPR, 2021 (Oral). [PDF]

Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song, Shizhe Chen, Qin Jin.
CVPR, 2021. [PDF]

Say as You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen, Qin Jin, Peng Wang, Qi Wu
CVPR, 202O (Oral). [PDF] [Code]

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu
CVPR, 2020. [PDF] [Code]

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, Jin Zhou
ACM Multimedia, 2019 (Oral). [PDF] [Data]

From words to sentence: A progressive learning approach for zero-resource machine translation with visual pivots
Shizhe Chen, Qin Jin, Jianlong Fu
IJCAI, 2019. [PDF]

Unsupervised Bilingual Lexicon Induction from Mono-lingual Multimodal Data
Shizhe Chen, Qin Jin, Alexander Hauptmann
AAAI, 2019. [PDF]

Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Yuqing Song, Shizhe Chen, Qin Jin
ACM Multimedia, 2019 (Oral). [PDF]

Visual Relation Detection with Multi-Level Attention
Sipeng Zheng, Shizhe Chen, Qin Jin
ACM Multimedia, 2019. [PDF]

YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension
Weiying Wang, Yongcheng Wang, Shizhe Chen, Qin Jin
EMNLP, 2019. [PDF][Data]

Generating Video Descriptions with Latent Topic Guidance
Shizhe Chen, Qin Jin, Jia Chen, Alexander Hauptman
TMM, 2019. [PDF]

Class-aware Self-attention for Audio Event Recognition
Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptman
ICMR, 2018 (Best Paper Runner-up). [PDF]

Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction
Shizhe Chen, Qin Jin
ACM Multimedia, 2016. [PDF]