Enze Xie

About

I am a Senior Research Scientist at NVIDIA Research (Boston) working with Prof. Song Han. I am also a visiting researcher at MIT HAN Lab. I was a Principal Researcher and Research Lead (in Generative AI) at Huawei Noah's Ark Lab (Hong Kong). I obtained my Ph.D. degree from the Department of Computer Science, The University of Hong Kong in 2022. My advisor is Prof. Ping Luo and my co-advisor is Prof. Wenping Wang. I also work very closely with my friend Wenhai Wang. I was fortunate to work with Prof. Chunhua Shen (UofAdelaide), Prof. Anima Anandkumar (CalTech), and Prof. Sanja Fidler (UofT). During my PhD study, I collaborated with several researchers in industry e.g. Facebook and NVIDIA.

My research interest is solving challenging problems in generative AI, computer vision and deep learning. I developed a few well-known computer vision algorithms including:

PolarMask (Rank 10 in CVPR 2020 Top-10 Influential Papers).
PVT (Rank 2 in ICCV 2021 Top-10 Influential Papers).
SegFormer (Rank 3 in NeurIPS 2021 Top-10 Influential Papers).
BEVFormer (Rank 6 in ECCV 2022 Top-10 Influential Papers).

News

[2025.06] 1 paper accepted to ICML 2025, and 3 papers accepted to ICCV 2025.
[2025.02] Co-authored book《计算机视觉十讲》 (Ten Lectures on Computer Vision) has been published!
[2025.01] 5 papers accepted to ICLR 2025 (1 Oral, 1 Spotlight).
[2024.12] Serve as Area Chair for ICCV 2025 and Guest Editor for Pattern Recognition.
[2024.08] 1 paper accepted to Nature Communications Engineering. (The first time submitted a paper to Nature and got accepted!)
[2024.07] 3 paper accepted to ECCV, 1x RA-L and 1x TMLR.
[2024.04] Back to NVIDIA Research! Join the Efficient AI team.
[2024.01] 6 papers accepted to ICLR 2023 (1 Oral, 1 Spotlight) and 1 paper accepted to CVPR 2024.
[2023.11] 1 survey paper accepted to TPAMI.
[2023.09] 4 papers accepted to NeurIPS 2023.
[2023.07] 5 papers accepted to ICCV 2023 (2 Oral) and 1 paper accepted to TPAMI.
[2023.05] 1 paper accepted to ACL 2023.
[2022.08] Obtained my Ph.D. degree from HKU and join Huawei Noah's Ark Lab.
[2019.10] Join HKU MMLab as a PhD student and play computer vision.

Industry Projects

My research SANA contributes to NVIDIA's flagship generative AI projects including DLSS and Cosmos. Besides, my research SegFormer has significantly improved the robustness of real-world autonomous driving.

NVIDIA DLSS: Supreme Speed, Superior Visuals, Powered by AI

NVIDIA Cosmos: A World Foundation Model Platform for Physical AI

SegFormer: Robust Perception with Vision Transformer

Selected Publications [Full List]

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Junsong Chen, Shuchen Xue, Yuyang Zhao, Jincheng Yu, Sayak Paul, Junyu Chen, Han Cai, Song Han, Enze Xie

ICCV 2025

Project arXiv Code

SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer

Enze Xie*, Junsong Chen*, Yuyang Zhao, Jincheng Yu, Ligeng Zhu, Yujun Lin, Zhekai Zhang, Muyang Li, Junyu Chen, Han Cai, Bingchen Liu, Daquan Zhou, Song Han

ICML 2025

Project arXiv Code

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

Enze Xie*, Junsong Chen*, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, Song Han

ICLR 2025 (Oral)

Project arXiv Code Demo

PIXART-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Junsong Chen*, Chongjian Ge*, Enze Xie*^†, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li

ECCV 2024

Project arXiv Code

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Junsong Chen*, Jincheng Yu*, Chongjian Ge*, Lewei Yao*, Enze Xie^†, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, Zhenguo Li

ICLR 2024 (Spotlight)

Project arXiv Code

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li

ICCV 2023 (Oral)

arXiv Code

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai

ECCV 2022 (ECCV 2022 Top-10 Influential Papers)

arXiv Code

Understanding The Robustness in Vision Transformers

Daquan Zhou, Zhiding Yu, Enze Xie, Chaowei Xiao, Anima Anandkumar, Jiashi Feng, Jose M. Alvarez

ICML 2021 (Spotlight)

arXiv Code NVIDIA Demo

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkuma, Jose M. Alvarez, Ping Luo

NeurIPS 2021 (NeurIPS 2021 Top-10 Influential Papers)

arXiv Code NVIDIA Demo

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

ICCV 2021 (Oral) (ICCV 2021 Top-10 Influential Papers)

arXiv Code

PolarMask: Single Shot Instance Segmentation with Polar Representation

Enze Xie*, Peize Sun*, Xiaoge Song*, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo

CVPR 2020 (Oral) (CVPR20 Top-10 Influential Papers)

arXiv Code

Academic Service

Area Chair for ICCV 2025

Guest Editor for Pattern Recognition

Organizer for T4V: Transformers for Vision Workshop at CVPR 2022

Journal Reviewer for TPAMI, IJCV, RA-L, T-MM

Conference Reviewer for ICML, NeurIPS, CVPR, ICCV, ECCV, IROS, IJCAI, AAAI, WACV, ACCV

SPC for IJCAI21,22

Honours and Awards

KAUST AI Rising Star Award	2025
WAIC Rising Star Award (15 awardees globally each year)	2024
Top 2% Scientists Worldwide 2023 by Stanford University	2023
WAIC Youth Outstanding Paper (10 papers selected)	2023
Most Popular Speakers in TechBeat 2022	2022
NVIDIA Graduate Fellowship Finalist Award (The first candidate from Chinese University in 21 years)	2022
NeurIPS 2021 Outstanding Reviewer Award (top 8% of reviewers)	2021
Hong Kong and China Gas Company Limited Postgraduate Prize	2021
Rank 1 in National Artificial Intelligence Competition - Remote Sensing Segmentation (bonus 1,000,000 RMB)	2020
Rank 1 in Google Open Images 2019 - Instance Segmentation	2019
Rank 1 in ICDAR 2019 Arbitrary-Shaped Text Detection	2019
Rank 2 in ICDAR 2019 Large-scale Street View Text Detection	2019
Outstanding Master Thesis Award, Tongji University	2019

Experience and Collaborations