Conghui He


Senior Research Director, Sensetime

Research Scientist & PI, Shanghai AI Laboratory

I am currently a Research Director at SenseTime Inc., as well as a Research Scientist and PI at the Shanghai AI Laboratory. Prior to this, I worked at WeChat as a Senior Researcher, where I initiated and developed the high-performance graph computing framework, Plato. Before joining WeChat, I earned my PhD degree (2013-2018) from the Department of Computer Science at Tsinghua University under the supervision of Prof. Haohuan Fu, and my Bachelor’s degree (2009-2013) from the Department of Software Engineering at Sun Yat-Sen University.

My research interests include High Performance Computing, Computer Vision, and Large Language Models. In 2017, I was honored with the Gordon Bell Prize , which is the highest distinction in the high-performance computing application domain. Currently, I lead the OpenDataLab team, which aims to build an influential open dataset platform that facilitates the development, analysis and research of Artificial General Intelligence (AGI). Additionally, I oversee a data team that collects and curates massive datasets for large language models.

At SenseTime and the Shanghai AI Laboratory, we are actively hiring PhDs, postdocs, interns, and full-time researchers. If you’re interested in joining our team, please feel free to reach out to me via email.

You can check out my CV here.


May 27, 2024 2 paper are accepted by ACL 2024.
May 02, 2024 1 paper is accepted by ICML 2024.
Mar 19, 2024 We release Wanjuan-CC, a safe and high-quality Webtext dataset.
Feb 27, 2024 3 papers are accepted by CVPR 2024.
Sep 09, 2023 We release InternLM2. See arXiv for details.
Aug 21, 2023 We release Wanjuan 1.0, a large-scale multi-modal dataset for pretraining.
Jun 03, 2023 VIGC is accepted by AAAI 2024.
Jun 03, 2023 We release InternLM. You can find technical report here.
Mar 21, 2022 We launch OpenDataLab, an open data platform that enpowers AGI.

selected publications

  1. InternLM2 Technical Report
    Zheng Cai , Maosong Cao , Haojiong Chen , and 8 more authors
    arXiv preprint arXiv:2403.17297, 2024
  2. AAAI
    Vigc: Visual instruction generation and correction
    Bin Wang , Fan Wu , Xiao Han , and 8 more authors
    In Proceedings of the AAAI Conference on Artificial Intelligence , 2024
  3. ACL
    Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations
    Jiaxing Sun , Weiquan Huang , Jiang Wu , and 5 more authors
    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024
  4. WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset
    Jiantao Qiu , Haijun Lv , Zhenjiang Jin , and 8 more authors
    arXiv preprint arXiv:2402.19282, 2024
  5. Wanjuan: A comprehensive multimodal dataset for advancing english and chinese large models
    Conghui He , Zhenjiang Jin , Chao Xu , and 6 more authors
    arXiv preprint arXiv:2308.10755, 2023
  6. Mmbench: Is your multi-modal model an all-around player?
    Yuan Liu , Haodong Duan , Yuanhan Zhang , and 8 more authors
    arXiv preprint arXiv:2307.06281, 2023
  7. gao2023llama.png
    Llama-adapter v2: Parameter-efficient visual instruction model
    Peng Gao , Jiaming Han , Renrui Zhang , and 8 more authors
    arXiv preprint arXiv:2304.15010, 2023
  8. CVPR
    OmniCity: Omnipotent city understanding with multi-level and multi-view images
    Weijia Li , Yawen Lai , Linning Xu , and 5 more authors
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023
  9. SC17
    9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios
    Haohuan Fu , Conghui He , Bingwei Chen , and 8 more authors
    In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , 2017