About Me
I am a Young Leading Scientist at the Shanghai AI Lab and an Adjunct Doctoral Supervisor at School of AI, SJTU. Recognized as a National-level Young Talent, I hold a Ph.D. from Tsinghua University and was a visiting researcher at Stanford University and Imperial College London.
My research in data-centric AI and high-performance computing has driven significant technological innovation and industry impact. I have authored over 150 papers in top-tier venues, garnered over 9,000 citations on Google Scholar, and my open-source projects have attracted a community of over 50,000 stars on GitHub. My accolades include the Gordon Bell Prize, an ACL Best Theme Paper Award, and the WAIC Yunfan Award. I am the creator of MinerU , the world’s leading open-source data engine for large models. This work, in conjunction with OpenDataLab, has significantly influenced the AI and open-source landscape. Additionally, I oversee a dedicated data team that curates high-quality datasets for leading models such as InternLM and InternVL.
We are hiring! I am actively seeking talented Ph.D. students, postdoctoral fellows, interns, and full-time researchers. If you are passionate about building the future of AI, I welcome you to contact me via email.
🔥 Recent News
- 2025.07: 🎉 I received the ACL Best Theme Paper Award [1].
- 2025.07: 🎉 I won the World Artificial Intelligence Conference Yunfan Award (one of 11 global recipients under the age of 35, 2025)
- 2025.05: 🎉 [1][2][3][4][5] papers are accepted by ICCV 2025.
- 2025.05: 🎉 [1][2][3][4][5][6][7][8][9][10][11] papers are accepted by ACL 2025.
- 2025.02: 🎉 [1][2][3][4][5] papers are accepted by CVPR 2025.
- 2025.01: 🎉 [1] papers is accepted by NACCL 2025.
- 2025.01: 🎉 [1][2][3][4][5][6][7]papers are accepted by ICLR 2025.
💻 Open-source Projects
- MinerU
, the world’s leading open-source data parsing engine for LLM/Rag/Agent.
- InternLM
, a series of leading LLM models developed by Shanghai AI Laboratory.
- OpenDataLab
, an open platform that facilitates the development of AGI by sharing datasets and open-sourced tools. It hosts over 7700 datasets and provides 50+ million data retrieval services to over 200,000 developers.
📝 Selected Publications
I have authored over 150 papers in top-tier venues, garnered over 9,000 citations on google scholar. Following are selected publicatioins. († Corresponding Authors)
ACL 2025
Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models, Xinlin Zhuang, Jiahui Peng, Ren Ma, Yinfan Wang, Tianyi Bai, Xingjian Wei, Jiantao Qiu, Chi Zhang, Ying Qian, Conghui He† (ACL best theme paper 🎉)ICLR 2025
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text, Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, …, Conghui He†, Jifeng Dai†ECCV 2024
Mmbench: Is your multi-modal model an all-around player?, Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin (over 1000+ citations 🎉)ECCV 2024
Sharegpt4v: Improving large multi-modal models with better captions, Lin Chen, Jisong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua LinSC 2017
, 18.9-Pflops Nonlinear Earthquake Simulation on Sunway TaihuLight: Enabling Depiction of 18-Hz and 8-Meter Scenarios, Haohuan Fu†, Conghui He†, Bingwei Chen, Zekun Yin, Zhenguo Zhang, Wenqiang Zhang, Tingjian Zhang, Wei Xue†, Weiguo Liu, Wanwang Yin, Guangwen Yang, Xiaofei Chen (Gordon Bell Prize 🎉)
🎖 Selected Honors
- 2025, ACL Best Theme Paper (3/8000)
- 2025, World Artificial Intelligence Conference Yunfan Award (one of 11 global recipients under the age of 35)
- 2023, SenseTime Award (Sensetime’s highest award, 1 team from 100 teams)
- 2019, Tencent Technology Breakthrough Award - Gold Prize (highest technical award, 1 team from 50 teams)
- 2018, Outstanding Graduate PhD Student Award
- 2017, ACM Gordon Bell Prize (the highest award in the field of HPC applications)
- 2017, National PhD Scholarship (1%)
- 2013, Global Champion of the IEEE-IBM Smarter Planet Challenge (Team Leader, 1/54)