email: shizhe.chen [at] inria.fr

My name is Shizhe Chen. I am a postdoc researcher at INRIA Paris, working in the WILLOW project-team with Ivan Laptev and Cordelia Schmid. My research interests are vision-and-language, embodied ai and multimodal deep learning. My CV can be found here.

I received my Ph.D. and B.S degrees from Renmin University of China, advised by Qin Jin in 2020 and 2015 respectively. I have visited Carnegie Mellon University in 2018 advised by Alexander Hauptmann, and University of Adelaide in 2019 advised by Qi Wu. I worked at MSRA with Jianlong Fu and Ruihua Song in 2019. I received Baidu Scholarship in 2017 and Beijing Outstanding Graduate Award in 2020.

[Google Scholar] [GitHub] [Linkedin]


The HM3DAutoVLN paper is accepted to ECCV 2022 and also achieves the 2nd place in REVERIE Challenge.

Our team wins the REVERIE and SOON VLN Challenges at ICCV 2021 HIRV Workshop.

HAMT accepted by NeurIPS 2021.

Two papers accepted by ICCV 2021.

Two papers (1 oral and 1 poster) accepted by ACM Multimedia 2021.

Two papers (1 oral and 1 poster) accepted by CVPR 2021.


We are organizing YouMakeup VQA Challenge at CVPR 2020 for fine-grained action understanding!

Two papers (1 oral and 1 poster) accepted by CVPR 2020.

Selected Publications

Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
Shizhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev.
CVPR, 2022. [Project] [PDF] [Code]

Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
Shizhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev.
CVPR, 2022. [Project] [PDF] [Code]

History Aware Multimodal Transformer for Vision-and-Language Navigation
Shizhe Chen, Pierre-Louis Guhur, Cordelia Schmid, Ivan Laptev.
NeurIPS, 2021. [Project] [PDF] [Code]

Elaborative Rehearsal for Zero-shot Action Recognition
Shizhe Chen, Dong Huang.
ICCV, 2021. [PDF] [Code]

Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur, Makarand Tapaswi, Shizhe Chen, Ivan Laptev, Cordelia Schmid.
ICCV, 2021. [Project] [PDF] [Code]

Question-controlled Text-aware Image Captioning
Anwen Hu, Shizhe Chen, Qin Jin.
ACM Multimedia, 2021. [PDF]

Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
Yuqing Song, Shizhe Chen, Qin Jin, Wei Luo, Jun Xie, Fei Huang.
ACM Multimedia, 2021 (Oral). [PDF]

Sketch, Ground, and Refine: Top-Down Dense Video Captioning
Chaorui Deng, Shizhe Chen, Da Chen, Yuan He, Qi Wu.
CVPR, 2021 (Oral). [PDF]

Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song, Shizhe Chen, Qin Jin.
CVPR, 2021. [PDF]

Say as You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen, Qin Jin, Peng Wang, Qi Wu
CVPR, 202O (Oral). [PDF] [Code]

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu
CVPR, 2020. [PDF] [Code]

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, Jin Zhou
ACM Multimedia, 2019 (Oral). [PDF] [Data]

From words to sentence: A progressive learning approach for zero-resource machine translation with visual pivots
Shizhe Chen, Qin Jin, Jianlong Fu
IJCAI, 2019. [PDF]

Unsupervised Bilingual Lexicon Induction from Mono-lingual Multimodal Data
Shizhe Chen, Qin Jin, Alexander Hauptmann
AAAI, 2019. [PDF]

Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Yuqing Song, Shizhe Chen, Qin Jin
ACM Multimedia, 2019 (Oral). [PDF]

Visual Relation Detection with Multi-Level Attention
Sipeng Zheng, Shizhe Chen, Qin Jin
ACM Multimedia, 2019. [PDF]

YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension
Weiying Wang, Yongcheng Wang, Shizhe Chen, Qin Jin
EMNLP, 2019. [PDF][Data]

Generating Video Descriptions with Latent Topic Guidance
Shizhe Chen, Qin Jin, Jia Chen, Alexander Hauptman
TMM, 2019. [PDF]

Class-aware Self-attention for Audio Event Recognition
Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptman
ICMR, 2018 (Best Paper Runner-up). [PDF]

Awards & Honors


Area Chair

Conference Reviewer
EMNLP'22, ECCV'22, CVPR'22, AAAI'22, ACM MM'21, ICCV'21, CVPR'21, AAAI'21, EMNLP'21, ICME'20, CVPR'20, AAAI'20, ACL'20

Journal Reviewer
Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
IEEE Transactions on Multimedia (TMM)
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)