Enze Xie (谢恩泽)

CV / GitHub / Google Scholar / Zhihu / Email: |

I am currently a senior researcher at AI Theory Lab of Huawei Noah's Ark Lab (Hong Kong). I obtained my Ph.D. degree from Department of Computer Science, The University of Hong Kong in 2022. My advisor is Prof. Ping Luo and my co-advisor is Prof. Wenping Wang. I also work very close with my friend Wenhai Wang. I was fortunate to work with Prof. Chunhua Shen, Prof. Anima Anandkumar, and Prof. Sanja Fidler, During my PhD study, I collaborated with several researchers in industry e.g. Facebook (Meta) and NVIDIA.

My research interest is solving challenging problems in computer vision and machine learning. I did some works on instance-level detection and self/semi/weak-supervised learning. I developed a few well-known computer vision algorithms including: PolarMask (Rank 10 in CVPR 2020 Top-10 Influential Papers). PVT (Rank 2 in ICCV 2021 Top-10 Influential Papers). SegFormer (Rank 3 in NeurIPS 2021 Top-10 Influential Papers).
I co-developed OpenSelfSup (now mmselfsup), a popular self-supervised learning framework with 2k+ github star.

Recently, I am working on the following research problems:

  • Network Architecture Design and Training Strategy
  • 3D Vision including Autonomous Driving
  • AI for Science

Always looking for research interns with strong CV/ML background, feel free to shoot an email if interested.
We also have several positions for joint PhD programs with CUHK/HKUST, and research engineers.


  • [2022.08] Obtained my Ph.D. degree from HKU and join Huawei Noah's Ark Lab.
  • [2019.10] Join HKU MMLab as a PhD student and play computer vision.


(* indicates equal contribution)

Selected Papers

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkuma, Jose M. Alvarez, Ping Luo
NeurIPS 2021 [paper] [code] [中文解读] [demo] [NeurIPS 2021 Top-10 Influential Papers]
NVIDIA's first Vision Transformer work and transferred to several product teams.

DetCo: Unsupervised Contrastive Learning for Object Detection

Enze Xie*, Jian Ding*, Wenhai Wang, Xiaohang Zhan, Hang Xu, Zhenguo Li, Ping Luo
ICCV 2021 [paper] [code]
We introduce a detection-friendly unsupervised pre-training solution using large-scale unlabeled data.

PVTv2: Improved Baselines with Pyramid Vision Transformer

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
Tech report, arXiv [paper] [code]
A better version of PVT.

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
ICCV 2021 (Oral) [paper] [code] [ICCV 2021 Top-10 Influential Papers]
The first work to extend Vision Transformer for object detection and segmentation.

PolarMask: Single Shot Instance Segmentation with Polar Representation

Enze Xie*, Peize Sun*, Xiaoge Song*, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo
CVPR 2020 (Oral) [paper] [code] [中文解读] [talk] [CVPR20 Top-10 Influential Papers]
We introduced a new Polar Representation to reformulate instance segmentation.

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond

Enze Xie*, Wenhai Wang*, Mingyu Ding, Ruimao Zhang, Ping Luo
TPAMI 2021 [paper] [code]
We extend PolarMask(CVPR'20) to several instance-level detection tasks.

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

Wenhai Wang*, Enze Xie*, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen
TPAMI 2021 [paper] [code]
We extend PSENet (CVPR'19) and PAN (ICCV'19) to a text spotting system.

Other Papers

Fast-BEV: Towards Real-time On-vehicle Bird’s-Eye View Perception

Bin Huang*, Yangguang Li*, Enze Xie*, Feng Liang*, Luya Wang, Mingzhu Shen, Fenggang Liu, Tianqi Wang, Ping Luo, Jing Shao
NeurIPS 2022 ML4AD workshop [paper] [code]

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai
ECCV 2022 [paper] [code]

Understanding The Robustness in Vision Transformers

Daquan Zhou, Zhiding Yu, Enze Xie,Chaowei Xiao, Anima Anandkumar, Jiashi Feng, Jose M. Alvarez
ICML 2022 (Spotlight) [paper] [code]

Improving Monocular Visual Odometry Using Learned Depth

Libo Sun, Wei Yin, Enze Xie, Zhengrong Li, Changming Sun, Chunhua Shen
IEEE Transactions on Robotics 2022 [paper]

Deeply Unsupervised Patch Re-Identification for Pre-training Object Detectors

Jian Ding*, Enze Xie*, Hang Xu, Chenhan Jiang, Zhenguo Li, Ping Luo, Gui-Song Xia
TPAMI 2022 [paper] [code]

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Tong Lu, Ping Luo
CVPR 2022 [paper] [code]

CycleMLP: A MLP-like Architecture for Dense Prediction

Shoufa Chen, Enze Xie, Chongjian Ge, Runjian Chen, Ding Liang, Ping Luo
ICLR 2022 (Oral) [paper] [code]

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

Zhe Chen, Wenhai Wang, Enze Xie, Tong Lu, Ping Luo
AAAI 2022 [paper] [code]

Watch Only Once: An End-to-End Video Action Detection Framework

Shoufa Chen, Peize Sun, Enze Xie, Chongjian Ge, Jiannan Wu, Lan Ma, Jiajun Shen, Ping Luo
ICCV 2021 [paper] [code]

What Makes for End-to-End Object Detection?

Peize Sun, Yi Jiang, Enze Xie, Wenqi Shao, Zehuan Yuan, Changhu Wang, Ping Luo
ICML 2021 [paper] [code]

Segmenting Transparent Objects in the Wild with Transformer

Enze Xie, Wenjia Wang, Wenhai Wang, Peize Sun, Hang Xu, Ding Liang, Ping Luo
IJCAI 2021 [paper] [code & dataset]

Segmenting Transparent Objects in the Wild

Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo
ECCV 2020 [paper] [code & dataset]

Scene Text Image Super-Resolution in the Wild

Wenjia Wang*, Enze Xie*, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua Shen, Xiang Bai
ECCV 2020 [paper] [code & dataset]

Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo
ECCV 2020 [paper]

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, ZhiBo Yang, Tong Lu, Chunhua Shen, Ping Luo
ECCV 2020 [paper] [Project Web]

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Wenhai Wang*, Enze Xie*, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen
ICCV 2019 [paper] [code]

Shape Robust Text Detection with Progressive Scale Expansion Network

Wenhai Wang*, Enze Xie*, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao
CVPR 2019 [paper] [code]

Scene Text Detection with Supervised Pyramid Context Network

Enze Xie*, Yuhang Zang*, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li
AAAI 2019 [paper]

Technical Report

M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Bird’s-Eye View Representation

Enze Xie, Zhiding Yu, Daquan Zhou, Jonah Philion, Anima Anandkumar, Sanja Fidler, Ping Luo, Jose M. Alvarez
Tech report, arXiv [paper] [project page]

TransTrack: Multiple-Object Tracking with Transformer

Peize Sun, Yi Jiang, Rufeng Zhang, Enze Xie, Jinkun Cao, Xinting Hu, Tao Kong, Zehuan Yuan, Changhu Wang, Ping Luo
Tech report, arXiv [paper] [code]

OneNet: Towards End-to-End One-Stage Object Detection

Peize Sun, Yi Jiang, Enze Xie, Zehuan Yuan, Changhu Wang, Ping Luo
Tech report, arXiv [paper] [code]

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervision and Dynamic Self-Training

Weijia Wu*, Enze Xie* , Ruimao Zhang, Wenhai Wang, Guan Pang, Zhen Li, Hong Zhou, Ping Luo
Tech report, arXiv [paper] [code]

1st Place Solutions for OpenImage2019--Object Detection and Instance Segmentation

Yu Liu, Guanglu Song, Yuhang Zang, Yan Gao, Enze Xie, Junjie Yan, Chen Change Loy, Xiaogang Wang
Tech report, arXiv [paper]

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

Wenjia Wang*, Enze Xie*, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo
Tech report, arXiv [paper] [code]
Improved version has been accepted by ECCV2020


Rank 1 in National Artificial Intelligence Competition - Remote Sensing Segmentation (bonus 1,000,000 RMB)

Rank 1 in Google Open Images 2019 - Instance Segmentation

Rank 1 in ICDAR 2019 Arbitrary-Shaped Text Detection

Rank 2 in ICDAR 2019 Large-scale Street View Text Detection

Professional Activities

Organizer for T4V: Transformers for Vision Workshop at CVPR 2022

Journal Reviewer for TPAMI, IJCV, RA-L, T-MM

Conference Reviewer for ICML, NeurIPS, CVPR, ICCV, ECCV, IROS, IJCAI, AAAI, WACV, ACCV

SPC for IJCAI21,22

Invited Talks

International Digital Economy Academy (IDEA): "Transformer for Detection & Segmentation in 2D and 3D Vision"

NVIDIA DriveAV & Huawei Noah's Ark Lab: "M^2BEV: Multi-Camera Multi-Task Learning for AV Preception"

Stanford MedAI : "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers"

将门 : "Transformer在检测和分割中的应用"

Huawei Noah's Ark Lab - AI Theory Group : "Instance Level Detection and Beyond"

SenseTime : "Self-Supervised Learning for Classification and Beyond"

Microsoft Research Asia (MSRA) VCG : "Polar Representation in Instance Segmentation"
Hong Kong Computer Vision Workshop (HKCVW) : "Real-Time Scene Text Detection"

Honours and Awards

Most Popular Speakers in TechBeat 2022

NVIDIA Graduate Fellowship Finalist Award (The first candidate from Chinese University in 21 years)

NeurIPS 2021 Outstanding Reviewer Award (top 8% of reviewers)

Hong Kong and China Gas Company Limited Postgraduate Prize

Outstanding Master Thesis Award, Tongji University

Some of my Friends

Wenhai Wang (NJU), Wenjia Wang (SenseTime), Jingbo Wang (CUHK), Xiaohang Zhan (CUHK), Guan Pang (Facebook AI),
Chunhua Shen (Uni Adelaide), Lanqing Hong (Huawei Noah), Yunhe Wang (Huawei Noah)