About

email: shizhe.chen [at] inria.fr

My name is Shizhe Chen. I am a researcher at WILLOW project-team in INRIA Paris.
Previously, I was a postdoctoral researcher at WILLOW, working with Ivan Laptev and Cordelia Schmid. I received my PhD degree from Renmin University of China in 2020, advised by Qin Jin. During my doctoral studies, I visited Carnegie Mellon University, University of Adelaide and Microsoft Research Asia. Prior to my PhD, I earned my Bachelor's degree from Renmin University of China in 2015.
My research interests include vision-and-language and more recently embodied AI.
My CV can be found here.

I am looking for talented postdocs. If you are interested in working with me, please fill in this application form.

[Google Scholar] [GitHub] [Linkedin]

News



03/2024
One paper on 3D pretraining for robotics accepted by CVPR 2024.

08/2023
Our paper on embodied captioning is accepted by ICCV 2023!

06/2023
Our papers on object-goal navigation and sim-to-real transfer are accepted by IROS 2023!

03/2023
One paper accepted by CVPR 2023.

08/2022
The HM3DAutoVLN paper is accepted to ECCV 2022 and also achieves the 2nd place in REVERIE Challenge.

10/2021
Our team wins the REVERIE Challenge at ICCV 2021 HIRV Workshop.

Publication



Go to Google Scholar for full publication list.

SUGAR: Pre-training 3d visual representations for robotics
Shizhe Chen, Ricardo Garcia, Ivan Laptev, Cordelia Schmid.
CVPR, 2024. [Project] [PDF]

Polarnet: 3d point clouds for language-guided robotic manipulation
Shizhe Chen*, Ricardo Garcia*, Cordelia Schmid, Ivan Laptev.
CoRL, 2023. [Project] [PDF] [Code]

Object Goal Navigation with Recursive Implicit Maps
Shizhe Chen, Thomas Chabal, Ivan Laptev, Cordelia Schmid.
IROS, 2023. [Project] [PDF] [Code]

Robust Visual Sim-to-real Transfer for Robotic Manipulation
Ricardo Garcia, Robin Strudel, Shizhe Chen, Etienne Arlaud, Ivan Laptev, Cordelia Schmid.
IROS, 2023. [Project] [PDF] [Code]

Explore and Tell: Embodied Visual Captioning in 3D Environments
Anwen Hu, Shizhe Chen, Liang Zhang, Qin Jin.
ICCV, 2023. [Project] [PDF] [Code]

TeViS: Translating Text Synopses to Video Storyboards
Xu Gu, Yuchong Sun, Feiyue Ni, Shizhe Chen, Xihua Wang, Ruihua Song, Boyuan Li, Xiang Cao.
ACM Multimedia, 2023. [Project] [PDF] [Code]

gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction
Zerui Chen, Shizhe Chen, Cordelia Schmid, Ivan Laptev.
CVPR, 2023. [Project] [PDF] [Code]

Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
Shizhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev.
NeurIPS, 2022. [Project] [PDF] [Code]

Instruction-driven history-aware policies for robotic manipulations
Pierre-Louis Guhur, Shizhe Chen, Ricardo Garcia, Makarand Tapaswi, Ivan Laptev, Cordelia Schmid.
CoRL, 2022 (Oral). [Project] [PDF] [Code]

Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
Shizhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev.
ECCV, 2022. [Project] [PDF] [Code]

Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning
Sipeng Zheng, Shizhe Chen, Qin Jin.
ECCV, 2022. [PDF]

Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
Shizhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev.
CVPR, 2022 (Oral). [Project] [PDF] [Code]

Vrdformer: End-to-end video visual relation detection with transformers
Sipeng Zheng, Shizhe Chen, Qin Jin.
CVPR, 2022 (Oral). [PDF] [Code]

History Aware Multimodal Transformer for Vision-and-Language Navigation
Shizhe Chen, Pierre-Louis Guhur, Cordelia Schmid, Ivan Laptev.
NeurIPS, 2021. [Project] [PDF] [Code]

Elaborative Rehearsal for Zero-shot Action Recognition
Shizhe Chen, Dong Huang.
ICCV, 2021. [PDF] [Code]

Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur, Makarand Tapaswi, Shizhe Chen, Ivan Laptev, Cordelia Schmid.
ICCV, 2021. [Project] [PDF] [Code]

Question-controlled Text-aware Image Captioning
Anwen Hu, Shizhe Chen, Qin Jin.
ACM Multimedia, 2021. [PDF] [Code]

Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
Yuqing Song, Shizhe Chen, Qin Jin, Wei Luo, Jun Xie, Fei Huang.
ACM Multimedia, 2021 (Oral). [PDF] [Code]

Sketch, Ground, and Refine: Top-Down Dense Video Captioning
Chaorui Deng, Shizhe Chen, Da Chen, Yuan He, Qi Wu.
CVPR, 2021 (Oral). [PDF]

Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song, Shizhe Chen, Qin Jin.
CVPR, 2021. [PDF] [Code]

Say as You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen, Qin Jin, Peng Wang, Qi Wu
CVPR, 202O (Oral). [PDF] [Code]

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu
CVPR, 2020. [PDF] [Code]

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, Jin Zhou
ACM Multimedia, 2019 (Oral). [PDF] [Data]

From words to sentence: A progressive learning approach for zero-resource machine translation with visual pivots
Shizhe Chen, Qin Jin, Jianlong Fu
IJCAI, 2019. [PDF]

Unsupervised Bilingual Lexicon Induction from Mono-lingual Multimodal Data
Shizhe Chen, Qin Jin, Alexander Hauptmann
AAAI, 2019. [PDF]

Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Yuqing Song, Shizhe Chen, Qin Jin
ACM Multimedia, 2019 (Oral). [PDF]

Visual Relation Detection with Multi-Level Attention
Sipeng Zheng, Shizhe Chen, Qin Jin
ACM Multimedia, 2019. [PDF]

YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension
Weiying Wang, Yongcheng Wang, Shizhe Chen, Qin Jin
EMNLP, 2019. [PDF][Data]

Generating Video Descriptions with Latent Topic Guidance
Shizhe Chen, Qin Jin, Jia Chen, Alexander Hauptman
TMM, 2019. [PDF]

Class-aware Self-attention for Audio Event Recognition
Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptman
ICMR, 2018 (Best Paper Runner-up). [PDF]

Awards & Honors



Services



Area Chair
ICLR 2024, CVPR 2024, NeurIPS 2023, ICCV 2023, CVPR 2023, ACM MM 2023, ACM MM 2022

Conference Reviewer
IROS 2023, EMNLP 2023, ACL 2023, EMNLP 2022, ECCV 2022, CVPR 2022, AAAI 2022, ACM MM 2021, ICCV 2021, CVPR 2021, AAAI 2021, EMNLP 2021, ICME 2020, CVPR 2020, AAAI 2020, ACL 2020

Journal Reviewer
Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
International Journal of Computer Vision (IJCV)
IEEE Transactions on Multimedia (TMM)
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)
IEEE Transactions on Robotics (T-RO)
IEEE Robotics and Automation Letters (RA-L)