About Me

I am a Young Leading Scientist at the Shanghai AI Lab and an Adjunct Doctoral Supervisor at School of AI, SJTU. Recognized as a National-level Young Talent, I hold a Ph.D. from Tsinghua University and was a visiting researcher at Stanford University and Imperial College London.

My research in data-centric AI and high-performance computing has driven significant technological innovation and industry impact. I have authored over 150 papers in top-tier venues, garnered over 9,000 citations on Google Scholar, and my open-source projects have attracted a community of over 50,000 stars on GitHub. My accolades include the Gordon Bell Prize, an ACL Best Theme Paper Award, and the WAIC Yunfan Award. I am the creator of MinerU , the world’s leading open-source data engine for large models. This work, in conjunction with OpenDataLab, has significantly influenced the AI and open-source landscape. Additionally, I oversee a dedicated data team that curates high-quality datasets for leading models such as InternLM and InternVL.

We are hiring! I am actively seeking talented Ph.D. students, postdoctoral fellows, interns, and full-time researchers. If you are passionate about building the future of AI, I welcome you to contact me via email.

🔥 Recent News

2025.12: 🎉 I received the Shanghai Science and Technology Youth 35 Leading Program (selected 35 scientists under the age of 35) [News/报道]
2025.09: 🎉 MinerU 2.5 is released! A 1.2B-parameter document parsing vision-language model that achieves state-of-the-art recognition accuracy while maintaining exceptional computational efficiency. [Tech Report] [Model] [GitHub]
2025.09: 🎉 [1][2][3][4][5][6] papers are accepted by NIPS 2025.
2025.07: 🎉 I received the ACL Best Theme Paper Award [1].
2025.07: 🎉 I won the World Artificial Intelligence Conference Yunfan Award (one of 11 global recipients under the age of 35, 2025)
2025.05: 🎉 [1][2][3][4][5] papers are accepted by ICCV 2025.
2025.05: 🎉 [1][2][3][4][5][6][7][8][9][10][11] papers are accepted by ACL 2025.
2025.02: 🎉 [1][2][3][4][5] papers are accepted by CVPR 2025.
2025.01: 🎉 [1] papers is accepted by NACCL 2025.
2025.01: 🎉 [1][2][3][4][5][6][7]papers are accepted by ICLR 2025.

💻 Open-source Projects

MinerU , the world’s leading open-source data parsing engine for LLM/Rag/Agent.
InternLM , a series of leading LLM models developed by Shanghai AI Laboratory.
OpenDataLab , an open platform that facilitates the development of AGI by sharing datasets and open-sourced tools. It hosts over 7700 datasets and provides 50+ million data retrieval services to over 200,000 developers.

📝 Selected Publications

I have authored over 150 papers in top-tier venues, garnered over 9,000 citations on google scholar. Following are selected publicatioins. († Corresponding Authors)

ACL 2025 Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models, Xinlin Zhuang, Jiahui Peng, Ren Ma, Yinfan Wang, Tianyi Bai, Xingjian Wei, Jiantao Qiu, Chi Zhang, Ying Qian, Conghui He† (ACL best theme paper 🎉)
ICLR 2025 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text, Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, …, Conghui He†, Jifeng Dai†
ECCV 2024 Mmbench: Is your multi-modal model an all-around player?, Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin (over 1000+ citations 🎉)
ECCV 2024 Sharegpt4v: Improving large multi-modal models with better captions, Lin Chen, Jisong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin
SC 2017, 18.9-Pflops Nonlinear Earthquake Simulation on Sunway TaihuLight: Enabling Depiction of 18-Hz and 8-Meter Scenarios, Haohuan Fu†, Conghui He†, Bingwei Chen, Zekun Yin, Zhenguo Zhang, Wenqiang Zhang, Tingjian Zhang, Wei Xue†, Weiguo Liu, Wanwang Yin, Guangwen Yang, Xiaofei Chen (Gordon Bell Prize 🎉)

🎖 Selected Honors

2025, ACL Best Theme Paper (3/8000)
2025, World Artificial Intelligence Conference Yunfan Award (one of 11 global recipients under the age of 35)
2023, SenseTime Award (Sensetime’s highest award, 1 team from 100 teams)
2019, Tencent Technology Breakthrough Award - Gold Prize (highest technical award, 1 team from 50 teams)
2018, Outstanding Graduate PhD Student Award
2017, ACM Gordon Bell Prize (the highest award in the field of HPC applications)
2017, National PhD Scholarship （1%）
2013, Global Champion of the IEEE-IBM Smarter Planet Challenge (Team Leader, 1/54)

Conghui He (何聪辉)

About Me

🔥 Recent News

💻 Open-source Projects

📝 Selected Publications

🎖 Selected Honors