Archives



08/2023
Our paper on embodied captioning is accepted by ICCV 2023!

07/2023
Our paper on text-to-video is accepted by ACM Multimedia 2023!

06/2023
Our papers on object-goal navigation and sim-to-real transfer are accepted by IROS 2023!

03/2023
One paper accepted by CVPR 2023.

08/2022
The HM3DAutoVLN paper is accepted to ECCV 2022 and also achieves the 2nd place in REVERIE Challenge.

10/2021
Our team wins the REVERIE Challenge at ICCV 2021 HIRV Workshop.

09/2021
HAMT accepted by NeurIPS 2021.
[PDF]

08/2021
Two papers accepted by ICCV 2021.

07/2021
Two papers (1 oral and 1 poster) accepted by ACM Multimedia 2021.

06/2021
Two papers (1 oral and 1 poster) accepted by CVPR 2021.

06/2021
[PDF]

04/2020
We are organizing YouMakeup VQA Challenge at CVPR 2020 for fine-grained action understanding!

03/2020
Two papers (1 oral and 1 poster) accepted by CVPR 2020.

11/2019
Our team is the winner of NIST Trecvid 2019 Video to Text Task.

10/2019
Our team achieves the second place in ICCV 2019 VATEX Video Captioning Challenge.
[PDF]

08/2019
We have three papers accepted by ACM Multimedia 2019!

08/2019
Our paper "YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension" has been accepted by EMNLP 2019 and the dataset is released.
[PDF] [Data]

06/2019
[PDF]

05/2019
Our paper "From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots" has been accepted by IJCAI 2019.
[PDF]

01/2019
Our paper "Unsupervised Bilingual Lexicon Induction from Mono-lingual Multimodal Data" has been accepted by AAAI 2019.
[PDF]

Selected Publications



History Aware Multimodal Transformer for Vision-and-Language Navigation
Shizhe Chen, Pierre-Louis Guhur, Cordelia Schmid, Ivan Laptev.
ICCV, 2021. [Project] [PDF] [Code]

Elaborative Rehearsal for Zero-shot Action Recognition
Shizhe Chen, Dong Huang.
ICCV, 2021. [PDF] [Code]

Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur, Makarand Tapaswi, Shizhe Chen, Ivan Laptev, Cordelia Schmid.
ICCV, 2021. [Project] [PDF] [Code]

Question-controlled Text-aware Image Captioning
Anwen Hu, Shizhe Chen, Qin Jin.
ACM Multimedia, 2021. [PDF]

Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
Yuqing Song, Shizhe Chen, Qin Jin, Wei Luo, Jun Xie, Fei Huang.
ACM Multimedia, 2021 (Oral). [PDF]

Sketch, Ground, and Refine: Top-Down Dense Video Captioning
Chaorui Deng, Shizhe Chen, Da Chen, Yuan He, Qi Wu.
CVPR, 2021 (Oral). [PDF]

Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song, Shizhe Chen, Qin Jin.
CVPR, 2021. [PDF]

Say as You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen, Qin Jin, Peng Wang, Qi Wu
CVPR, 202O (Oral). [PDF] [Code]

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu
CVPR, 2020. [PDF] [Code]

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, Jin Zhou
ACM Multimedia, 2019 (Oral). [PDF] [Data]

From words to sentence: A progressive learning approach for zero-resource machine translation with visual pivots
Shizhe Chen, Qin Jin, Jianlong Fu
IJCAI, 2019. [PDF]

Unsupervised Bilingual Lexicon Induction from Mono-lingual Multimodal Data
Shizhe Chen, Qin Jin, Alexander Hauptmann
AAAI, 2019. [PDF]

Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Yuqing Song, Shizhe Chen, Qin Jin
ACM Multimedia, 2019 (Oral). [PDF]

Visual Relation Detection with Multi-Level Attention
Sipeng Zheng, Shizhe Chen, Qin Jin
ACM Multimedia, 2019. [PDF]

YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension
Weiying Wang, Yongcheng Wang, Shizhe Chen, Qin Jin
EMNLP, 2019. [PDF][Data]

Generating Video Descriptions with Latent Topic Guidance
Shizhe Chen, Qin Jin, Jia Chen, Alexander Hauptman
TMM, 2019. [PDF]

Class-aware Self-attention for Audio Event Recognition
Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptman
ICMR, 2018 (Best Paper Runner-up). [PDF]

Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction
Shizhe Chen, Qin Jin
ACM Multimedia, 2016. [PDF]