Enze Xie (谢恩泽)

CV / GitHub / Google Scholar / Email: |

I am a Senior Research Scientist at NVIDIA Research, with the Efficient AI team led by Prof. Song Han (MIT). I am working on efficient AIGC/LLM/VLM models. I was a senior researcher and research lead (in Generative AI) at AI Theory Lab of Huawei Noah's Ark Lab (Hong Kong). I obtained my Ph.D. degree from Department of Computer Science, The University of Hong Kong in 2022. My advisor is Prof. Ping Luo and my co-advisor is Prof. Wenping Wang. I also work very close with my friend Wenhai Wang. I was fortunate to work with Prof. Chunhua Shen (UofAdelaide), Prof. Anima Anandkumar (CalTech), and Prof. Sanja Fidler (UofT), During my PhD study, I collaborated with several researchers in industry e.g. Facebook and NVIDIA.
I have worked for a period of time at a small yet specialized quant trading firm, developing Transformer-based futures forecasting models.

My research interest is solving challenging problems in computer vision and machine learning. I did some works on instance-level detection and self/semi/weak-supervised learning. I developed a few well-known computer vision algorithms including:

  • PolarMask (Rank 10 in CVPR 2020 Top-10 Influential Papers).
  • PVT (Rank 2 in ICCV 2021 Top-10 Influential Papers).
  • SegFormer (Rank 3 in NeurIPS 2021 Top-10 Influential Papers).
  • BEVFormer (Rank 6 in ECCV 2022 Top-10 Influential Papers).

  • I co-developed OpenSelfSup (now mmselfsup), a popular self-supervised learning framework with 2k+ github star.


  • [2024.04] Back to NVIDIA Research! Join the Efficient AI team.
  • [2024.01] 6 papers accepted to ICLR 2023 (1 Oral, 1 Spotlight) and 1 paper accepted to CVPR 2024.
  • [2023.11] 1 survey paper accepted to TPAMI.
  • [2023.09] 4 papers accepted to NeurIPS 2023.
  • [2023.07] 5 papers accepted to ICCV 2023 (2 Oral) and 1 paper accepted to TPAMI.
  • [2023.05] 1 paper accepted to ACL 2023.
  • [2022.08] Obtained my Ph.D. degree from HKU and join Huawei Noah's Ark Lab.
  • [2019.10] Join HKU MMLab as a PhD student and play computer vision.

  • Publications

    For the latest, check Google Scholar
    (* indicates equal contribution)

    Selected Papers

    SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

    Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkuma, Jose M. Alvarez, Ping Luo
    NeurIPS 2021 [paper] [code] [中文解读] [demo] [NeurIPS 2021 Top-10 Influential Papers]
    NVIDIA's first Vision Transformer work and transferred to several product teams.

    DetCo: Unsupervised Contrastive Learning for Object Detection

    Enze Xie*, Jian Ding*, Wenhai Wang, Xiaohang Zhan, Hang Xu, Zhenguo Li, Ping Luo
    ICCV 2021 [paper] [code]
    We introduce a detection-friendly unsupervised pre-training solution using large-scale unlabeled data.

    PVTv2: Improved Baselines with Pyramid Vision Transformer

    Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
    Tech report, arXiv [paper] [code] [ESI highly cited paper (1%), ESI hot papers (0.1%)]
    A better version of PVT.

    Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

    Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
    ICCV 2021 (Oral) [paper] [code] [ICCV 2021 Top-10 Influential Papers]
    The first work to extend Vision Transformer for object detection and segmentation.

    PolarMask: Single Shot Instance Segmentation with Polar Representation

    Enze Xie*, Peize Sun*, Xiaoge Song*, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo
    CVPR 2020 (Oral) [paper] [code] [中文解读] [talk] [CVPR20 Top-10 Influential Papers]
    We introduced a new Polar Representation to reformulate instance segmentation.

    PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond

    Enze Xie*, Wenhai Wang*, Mingyu Ding, Ruimao Zhang, Ping Luo
    TPAMI 2021 [paper] [code]
    We extend PolarMask(CVPR'20) to several instance-level detection tasks.

    PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

    Wenhai Wang*, Enze Xie*, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen
    TPAMI 2021 [paper] [code]
    We extend PSENet (CVPR'19) and PAN (ICCV'19) to a text spotting system.

    Other Papers

    Fast-BEV: Towards Real-time On-vehicle Bird’s-Eye View Perception

    Bin Huang*, Yangguang Li*, Enze Xie*, Feng Liang*, Luya Wang, Mingzhu Shen, Fenggang Liu, Tianqi Wang, Ping Luo, Jing Shao
    NeurIPS 2022 ML4AD workshop [paper] [code]

    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

    Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai
    ECCV 2022 [paper] [code]

    Understanding The Robustness in Vision Transformers

    Daquan Zhou, Zhiding Yu, Enze Xie,Chaowei Xiao, Anima Anandkumar, Jiashi Feng, Jose M. Alvarez
    ICML 2022 (Spotlight) [paper] [code]

    Improving Monocular Visual Odometry Using Learned Depth

    Libo Sun, Wei Yin, Enze Xie, Zhengrong Li, Changming Sun, Chunhua Shen
    IEEE Transactions on Robotics 2022 [paper]

    Deeply Unsupervised Patch Re-Identification for Pre-training Object Detectors

    Jian Ding*, Enze Xie*, Hang Xu, Chenhan Jiang, Zhenguo Li, Ping Luo, Gui-Song Xia
    TPAMI 2022 [paper] [code]

    Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

    Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Tong Lu, Ping Luo
    CVPR 2022 [paper] [code]

    CycleMLP: A MLP-like Architecture for Dense Prediction

    Shoufa Chen, Enze Xie, Chongjian Ge, Runjian Chen, Ding Liang, Ping Luo
    ICLR 2022 (Oral) [paper] [code]

    Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

    Zhe Chen, Wenhai Wang, Enze Xie, Tong Lu, Ping Luo
    AAAI 2022 [paper] [code]

    Watch Only Once: An End-to-End Video Action Detection Framework

    Shoufa Chen, Peize Sun, Enze Xie, Chongjian Ge, Jiannan Wu, Lan Ma, Jiajun Shen, Ping Luo
    ICCV 2021 [paper] [code]

    What Makes for End-to-End Object Detection?

    Peize Sun, Yi Jiang, Enze Xie, Wenqi Shao, Zehuan Yuan, Changhu Wang, Ping Luo
    ICML 2021 [paper] [code]

    Segmenting Transparent Objects in the Wild with Transformer

    Enze Xie, Wenjia Wang, Wenhai Wang, Peize Sun, Hang Xu, Ding Liang, Ping Luo
    IJCAI 2021 [paper] [code & dataset]

    Segmenting Transparent Objects in the Wild

    Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo
    ECCV 2020 [paper] [code & dataset]

    Scene Text Image Super-Resolution in the Wild

    Wenjia Wang*, Enze Xie*, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua Shen, Xiang Bai
    ECCV 2020 [paper] [code & dataset]

    Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

    Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo
    ECCV 2020 [paper]

    AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

    Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, ZhiBo Yang, Tong Lu, Chunhua Shen, Ping Luo
    ECCV 2020 [paper] [Project Web]

    Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

    Wenhai Wang*, Enze Xie*, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen
    ICCV 2019 [paper] [code]

    Shape Robust Text Detection with Progressive Scale Expansion Network

    Wenhai Wang*, Enze Xie*, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao
    CVPR 2019 [paper] [code]

    Scene Text Detection with Supervised Pyramid Context Network

    Enze Xie*, Yuhang Zang*, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li
    AAAI 2019 [paper]

    Technical Report

    M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Bird’s-Eye View Representation

    Enze Xie, Zhiding Yu, Daquan Zhou, Jonah Philion, Anima Anandkumar, Sanja Fidler, Ping Luo, Jose M. Alvarez
    Tech report, arXiv [paper] [project page]

    TransTrack: Multiple-Object Tracking with Transformer

    Peize Sun, Yi Jiang, Rufeng Zhang, Enze Xie, Jinkun Cao, Xinting Hu, Tao Kong, Zehuan Yuan, Changhu Wang, Ping Luo
    Tech report, arXiv [paper] [code]

    OneNet: Towards End-to-End One-Stage Object Detection

    Peize Sun, Yi Jiang, Enze Xie, Zehuan Yuan, Changhu Wang, Ping Luo
    Tech report, arXiv [paper] [code]

    SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervision and Dynamic Self-Training

    Weijia Wu*, Enze Xie* , Ruimao Zhang, Wenhai Wang, Guan Pang, Zhen Li, Hong Zhou, Ping Luo
    Tech report, arXiv [paper] [code]

    1st Place Solutions for OpenImage2019--Object Detection and Instance Segmentation

    Yu Liu, Guanglu Song, Yuhang Zang, Yan Gao, Enze Xie, Junjie Yan, Chen Change Loy, Xiaogang Wang
    Tech report, arXiv [paper]

    TextSR: Content-Aware Text Super-Resolution Guided by Recognition

    Wenjia Wang*, Enze Xie*, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo
    Tech report, arXiv [paper] [code]
    Improved version has been accepted by ECCV2020

    Academic Service

    Organizer for T4V: Transformers for Vision Workshop at CVPR 2022

    Journal Reviewer for TPAMI, IJCV, RA-L, T-MM

    Conference Reviewer for ICML, NeurIPS, CVPR, ICCV, ECCV, IROS, IJCAI, AAAI, WACV, ACCV

    SPC for IJCAI21,22

    Honours and Awards

    WAIC Rising Star Award (15 awardees globally each year)

    Top 2% Scientists Worldwide 2023 by Stanford University

    WAIC Youth Outstanding Paper (10 papers selected)

    Most Popular Speakers in TechBeat 2022

    NVIDIA Graduate Fellowship Finalist Award (The first candidate from Chinese University in 21 years)

    NeurIPS 2021 Outstanding Reviewer Award (top 8% of reviewers)

    Hong Kong and China Gas Company Limited Postgraduate Prize

    Rank 1 in National Artificial Intelligence Competition - Remote Sensing Segmentation (bonus 1,000,000 RMB)

    Rank 1 in Google Open Images 2019 - Instance Segmentation

    Rank 1 in ICDAR 2019 Arbitrary-Shaped Text Detection

    Rank 2 in ICDAR 2019 Large-scale Street View Text Detection

    Outstanding Master Thesis Award, Tongji University