email: cszhe1 [at] ruc.edu.cn
email: shizhe.chen [at] inria.fr

My name is Shizhe Chen. I am a postdoc researcher at INRIA Paris, working in the WILLOW project-team with Ivan Laptev and Cordelia Schmid. My research interests are vision-and-language, video understanding and multimodal deep learning. I am currently focusing on the vision-and-language navigation problem. My CV can be found here.

I received my Ph.D. and B.S degrees from Renmin University of China, advised by Qin Jin in 2020 and 2015 respectively. I have visited Carnegie Mellon University in 2018 advised by Alexander Hauptmann, and University of Adelaide in 2019 advised by Qi Wu. I worked at MSRA with Jianlong Fu and Ruihua Song in 2019. I received Baidu Scholarship in 2017 and Beijing Outstanding Graduate Award in 2020.

[Google Scholar] [GitHub] [Linkedin]


Our team wins the REVERIE and SOON VLN Challenges at ICCV 2021 HIRV Workshop.

HAMT accepted by NeurIPS 2021.

Two papers accepted by ICCV 2021.

Two papers (1 oral and 1 poster) accepted by ACM Multimedia 2021.

Two papers (1 oral and 1 poster) accepted by CVPR 2021.


We are organizing YouMakeup VQA Challenge at CVPR 2020 for fine-grained action understanding!

Two papers (1 oral and 1 poster) accepted by CVPR 2020.

Selected Publications

History Aware Multimodal Transformer for Vision-and-Language Navigation
Shizhe Chen, Pierre-Louis Guhur, Cordelia Schmid, Ivan Laptev.
NeurIPS, 2021. [Project] [PDF] [Code]

Elaborative Rehearsal for Zero-shot Action Recognition
Shizhe Chen, Dong Huang.
ICCV, 2021. [PDF] [Code]

Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur, Makarand Tapaswi, Shizhe Chen, Ivan Laptev, Cordelia Schmid.
ICCV, 2021. [Project] [PDF] [Code]

Question-controlled Text-aware Image Captioning
Anwen Hu, Shizhe Chen, Qin Jin.
ACM Multimedia, 2021. [PDF]

Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
Yuqing Song, Shizhe Chen, Qin Jin, Wei Luo, Jun Xie, Fei Huang.
ACM Multimedia, 2021 (Oral). [PDF]

Sketch, Ground, and Refine: Top-Down Dense Video Captioning
Chaorui Deng, Shizhe Chen, Da Chen, Yuan He, Qi Wu.
CVPR, 2021 (Oral). [PDF]

Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song, Shizhe Chen, Qin Jin.
CVPR, 2021. [PDF]

Say as You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen, Qin Jin, Peng Wang, Qi Wu
CVPR, 202O (Oral). [PDF] [Code]

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu
CVPR, 2020. [PDF] [Code]

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, Jin Zhou
ACM Multimedia, 2019 (Oral). [PDF] [Data]

From words to sentence: A progressive learning approach for zero-resource machine translation with visual pivots
Shizhe Chen, Qin Jin, Jianlong Fu
IJCAI, 2019. [PDF]

Unsupervised Bilingual Lexicon Induction from Mono-lingual Multimodal Data
Shizhe Chen, Qin Jin, Alexander Hauptmann
AAAI, 2019. [PDF]

Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Yuqing Song, Shizhe Chen, Qin Jin
ACM Multimedia, 2019 (Oral). [PDF]

Visual Relation Detection with Multi-Level Attention
Sipeng Zheng, Shizhe Chen, Qin Jin
ACM Multimedia, 2019. [PDF]

YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension
Weiying Wang, Yongcheng Wang, Shizhe Chen, Qin Jin
EMNLP, 2019. [PDF][Data]

Generating Video Descriptions with Latent Topic Guidance
Shizhe Chen, Qin Jin, Jia Chen, Alexander Hauptman
TMM, 2019. [PDF]

Class-aware Self-attention for Audio Event Recognition
Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptman
ICMR, 2018 (Best Paper Runner-up). [PDF]

Awards & Honors


Conference Reviewer

Journal Reviewer
Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
IEEE Transactions on Multimedia (TMM)
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)