Enze Xie (谢恩泽)

CV / GitHub / Google Scholar / Zhihu / Email: |

I am a PhD student in Department of Computer Science, The University of Hong Kong (HKU) since 2019, supervised by Prof. Ping Luo and co-supervised by Prof. Wenping Wang. I also work very close with my friend Wenhai Wang and Prof. Chunhua Shen. I obtained B.S. from Nanjing University of Aeronautics and Astronautics (2016) and M.S. from TongJi University (2019). From 2018 to present, I collaborated with several researchers in industry e.g. Face++(Megvii), SenseTime, Facebook, Huawei and NVIDIA.

My research interest is computer vision in 2D and 3D. I did some works about instance-level detection and self/semi/weak-supervised learning. I developed a few well-known computer vision algorithms including PolarMask, which was selected as CVPR 2020 Top-10 Influential Papers. I co-developed OpenSelfSup(1k+ star), a popular self-supervised learning framework.


(* indicates equal contribution)

Selected Papers

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkuma, Jose M. Alvarez, Ping Luo
NeurIPS 2021 [paper] [code] [中文解读] [demo]
NVIDIA's first Vision Transformer work and transferred to several product teams.

DetCo: Unsupervised Contrastive Learning for Object Detection

Enze Xie*, Jian Ding*, Wenhai Wang, Xiaohang Zhan, Hang Xu, Zhenguo Li, Ping Luo
ICCV 2021 [paper] [code]
We introduce a detection-friendly unsupervised pre-training solution using large-scale unlabeled data.

PVTv2: Improved Baselines with Pyramid Vision Transformer

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
Tech report, arXiv [paper] [code]
A better version of PVT.

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
ICCV 2021 (Oral) [paper] [code]
The first work to extend Vision Transformer for object detection and segmentation.

PolarMask: Single Shot Instance Segmentation with Polar Representation

Enze Xie*, Peize Sun*, Xiaoge Song*, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo
CVPR 2020 (Oral) [paper] [code] [中文解读] [talk] [CVPR20 Top-10 Influential Papers]
We introduced a new Polar Representation to reformulate instance segmentation.

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond

Enze Xie*, Wenhai Wang*, Mingyu Ding, Ruimao Zhang, Ping Luo
TPAMI 2021 [paper] [code]
We extend PolarMask(CVPR'20) to several instance-level detection tasks.

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

Wenhai Wang*, Enze Xie*, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen
TPAMI 2021 [paper] [code]
We extend PSENet (CVPR'19) and PAN (ICCV'19) to a text spotting system.

Other Papers

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

Zhe Chen, Wenhai Wang, Enze Xie, Tong Lu, Ping Luo
AAAI 2022 [paper] [code]

Watch Only Once: An End-to-End Video Action Detection Framework

Shoufa Chen, Peize Sun, Enze Xie, Chongjian Ge, Jiannan Wu, Lan Ma, Jiajun Shen, Ping Luo
ICCV 2021

What Makes for End-to-End Object Detection?

Peize Sun, Yi Jiang, Enze Xie, Wenqi Shao, Zehuan Yuan, Changhu Wang, Ping Luo
ICML 2021

Segmenting Transparent Objects in the Wild with Transformer

Enze Xie, Wenjia Wang, Wenhai Wang, Peize Sun, Hang Xu, Ding Liang, Ping Luo
IJCAI 2021 [paper] [code & dataset]

Segmenting Transparent Objects in the Wild

Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo
ECCV 2020 [paper] [code & dataset]

Scene Text Image Super-Resolution in the Wild

Wenjia Wang*, Enze Xie*, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua Shen, Xiang Bai
ECCV 2020 [paper] [code & dataset]

Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo
ECCV 2020 [paper]

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, ZhiBo Yang, Tong Lu, Chunhua Shen, Ping Luo
ECCV 2020 [paper] [Project Web]

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Wenhai Wang*, Enze Xie*, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen
ICCV 2019 [paper] [code]

Shape Robust Text Detection with Progressive Scale Expansion Network

Wenhai Wang*, Enze Xie*, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao
CVPR 2019 [paper] [code]

Scene Text Detection with Supervised Pyramid Context Network

Enze Xie*, Yuhang Zang*, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li
AAAI 2019 [paper]

Technical Report

Panoptic SegFormer

Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Tong Lu, Ping Luo
Tech report, arXiv [paper]

CycleMLP: A MLP-like Architecture for Dense Prediction

Shoufa Chen, Enze Xie, Chongjian Ge, Ding Liang, Ping Luo
Tech report, arXiv [paper] [code]

Unsupervised Pretraining for Object Detection by Patch Reidentification

Jian Ding*, Enze Xie*, Hang Xu, Chenhan Jiang, Zhenguo Li, Ping Luo, Gui-Song Xia
Tech report, arXiv [paper] [code]

TransTrack: Multiple-Object Tracking with Transformer

Peize Sun, Yi Jiang, Rufeng Zhang, Enze Xie, Jinkun Cao, Xinting Hu, Tao Kong, Zehuan Yuan, Changhu Wang, Ping Luo
Tech report, arXiv [paper] [code]

OneNet: Towards End-to-End One-Stage Object Detection

Peize Sun, Yi Jiang, Enze Xie, Zehuan Yuan, Changhu Wang, Ping Luo
Tech report, arXiv [paper] [code]

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervision and Dynamic Self-Training

Weijia Wu*, Enze Xie* , Ruimao Zhang, Wenhai Wang, Guan Pang, Zhen Li, Hong Zhou, Ping Luo
Tech report, arXiv [paper] [code]

1st Place Solutions for OpenImage2019--Object Detection and Instance Segmentation

Yu Liu, Guanglu Song, Yuhang Zang, Yan Gao, Enze Xie, Junjie Yan, Chen Change Loy, Xiaogang Wang
Tech report, arXiv [paper]

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

Wenjia Wang*, Enze Xie*, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo
Tech report, arXiv [paper] [code]
Improved version has been accepted by ECCV2020


Rank 1 in National Artificial Intelligence Competition - Remote Sensing Segmentation (bonus 100,0000 RMB)

Rank 1 in Google Open Images 2019 - Instance Segmentation

Rank 1 in ICDAR 2019 Arbitrary-Shaped Text Detection

Rank 2 in ICDAR 2019 Large-scale Street View Text Detection

Professional Activities


SPC for IJCAI2021

Invited Talks

Huawei Noah's Ark Lab - AI Theory Group : "Instance Level Detection and Beyond"

SenseTime : "Self-Supervised Learning for Classification and Beyond"

Microsoft Research Asia (MSRA) VCG : "Polar Representation in Instance Segmentation"
Hong Kong Computer Vision Workshop(HKCVW) : "Real-Time Scene Text Detection"

Honours and Awards

NVIDIA Graduate Fellowship Finalist Award

NeurIPS 2021 Outstanding Reviewer Award (top 8% of reviewers)

Hong Kong and China Gas Company Limited Postgraduate Prize

Outstanding Master Thesis Award, Tongji University

Some of my Friends

Wenhai Wang (NJU), Wenjia Wang (SenseTime), Jingbo Wang (CUHK), Xiaohang Zhan (CUHK), Guan Pang (Facebook AI), Chunhua Shen (Uni Adelaide)