About

email: cszhe1 [at] ruc.edu.cn
email: shizhe.chen [at] inria.fr

My name is Shizhe Chen. I am a postdoc researcher at INRIA Paris, working in the WILLOW project-team with Ivan Laptev and Cordelia Schmid. My research interests are vision-and-language, video understanding and multimodal deep learning. I am currently focusing on the vision-and-language navigation problem. My CV can be found here.

I received my Ph.D. and B.S degrees from Renmin University of China, advised by Qin Jin in 2020 and 2015 respectively. I have visited Carnegie Mellon University in 2018 advised by Alexander Hauptmann, and University of Adelaide in 2019 advised by Qi Wu. I worked at MSRA with Jianlong Fu and Ruihua Song in 2019. I received Baidu Scholarship in 2017 and Beijing Outstanding Graduate Award in 2020.

[Google Scholar] [GitHub] [Linkedin]

News



10/2021
Our team wins the REVERIE and SOON VLN Challenges at ICCV 2021 HIRV Workshop.

09/2021
HAMT accepted by NeurIPS 2021.
[PDF]

08/2021
Two papers accepted by ICCV 2021.

07/2021
Two papers (1 oral and 1 poster) accepted by ACM Multimedia 2021.

06/2021
Two papers (1 oral and 1 poster) accepted by CVPR 2021.

06/2021
[PDF]

04/2020
We are organizing YouMakeup VQA Challenge at CVPR 2020 for fine-grained action understanding!

03/2020
Two papers (1 oral and 1 poster) accepted by CVPR 2020.

Selected Publications



History Aware Multimodal Transformer for Vision-and-Language Navigation
Shizhe Chen, Pierre-Louis Guhur, Cordelia Schmid, Ivan Laptev.
ICCV, 2021. [Project] [PDF] [Code]

Elaborative Rehearsal for Zero-shot Action Recognition
Shizhe Chen, Dong Huang.
ICCV, 2021. [PDF] [Code]

Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur, Makarand Tapaswi, Shizhe Chen, Ivan Laptev, Cordelia Schmid.
ICCV, 2021. [Project] [PDF] [Code]

Question-controlled Text-aware Image Captioning
Anwen Hu, Shizhe Chen, Qin Jin.
ACM Multimedia, 2021. [PDF]

Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
Yuqing Song, Shizhe Chen, Qin Jin, Wei Luo, Jun Xie, Fei Huang.
ACM Multimedia, 2021 (Oral). [PDF]

Sketch, Ground, and Refine: Top-Down Dense Video Captioning
Chaorui Deng, Shizhe Chen, Da Chen, Yuan He, Qi Wu.
CVPR, 2021 (Oral). [PDF]

Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song, Shizhe Chen, Qin Jin.
CVPR, 2021. [PDF]

Say as You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen, Qin Jin, Peng Wang, Qi Wu
CVPR, 202O (Oral). [PDF] [Code]

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu
CVPR, 2020. [PDF] [Code]

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, Jin Zhou
ACM Multimedia, 2019 (Oral). [PDF] [Data]

From words to sentence: A progressive learning approach for zero-resource machine translation with visual pivots
Shizhe Chen, Qin Jin, Jianlong Fu
IJCAI, 2019. [PDF]

Unsupervised Bilingual Lexicon Induction from Mono-lingual Multimodal Data
Shizhe Chen, Qin Jin, Alexander Hauptmann
AAAI, 2019. [PDF]

Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Yuqing Song, Shizhe Chen, Qin Jin
ACM Multimedia, 2019 (Oral). [PDF]

Visual Relation Detection with Multi-Level Attention
Sipeng Zheng, Shizhe Chen, Qin Jin
ACM Multimedia, 2019. [PDF]

YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension
Weiying Wang, Yongcheng Wang, Shizhe Chen, Qin Jin
EMNLP, 2019. [PDF][Data]

Generating Video Descriptions with Latent Topic Guidance
Shizhe Chen, Qin Jin, Jia Chen, Alexander Hauptman
TMM, 2019. [PDF]

Class-aware Self-attention for Audio Event Recognition
Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptman
ICMR, 2018 (Best Paper Runner-up). [PDF]

Awards & Honors



Services



Conference Reviewer
CVPR, ICCV, ACL, EMNLP, ACM MM, SIGIR, AAAI, IJCAI, ICME, ICMR

Journal Reviewer
Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
IEEE Transactions on Multimedia (TMM)
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)