Introduction
He is currently a second year postgraduate student in University of Chinese Academy of Sciences UCAS, Institute of Computing Technology ICT, advised by Yinhe Han and Haobo Xu. He is also a full-time research intern in System Research Group of Microsoft Research Asia(MSRA), advised by Dr. Jilong Xue and Dr. Lingxiao Ma.
And he is now focusing on :
- Sparse Tensor Computation
- on-device llm inference
- Opensoure Accelerator System Design
- CPU Fallback based on Tengine [slides] [recording]
He also enjoys writing technical posts and contributes to various open-source communities, including Microsoft NNFusion, Apache TVM, and Tengine etc.
LeiWang1999's Github Chart
Education
-
University of Chinese Academy of Science
Institute of Computing Technology
Master in Computer Science(Aug. 2021 - Present) -
Nanjing Tech University
Bachor in Electronic Engineering(Aug. 2017 - Jun. 2021)
Overall GPA: 3.95/4.00
Ranking: 1/59
Awards & Honors
- 2018 Chinese National Scholarship(Top 0.3%)
- 2021 Excellent New Student Award of Chinese Academy of Science
- Njtech Person of Year 2020
- First Price of 2019 NUEDC (Top 0.5%) (全国大学生电子设计竞赛)
- First Price of 2018 Electronic Design Competition of Province
- Third Price of Integrated Circuit Innovation and Entrepreneurship Competition (FPGA hardware Accelerator for digital recognition)
- Third prize of National FPGA Competition (FPGA based FOSDA Alogrithom Implementation)
Experience
-
Netease Intelligent Hardware R&D Department
Bei Jing, China
NPU Development intern. (Sep. 2021 - Oct. 2021) -
Publications
Proceedings of the Nineteenth European Conference on Computer Systems., EuroSys 2025
18th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2024
ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, 2024
Symposium on Principles and Practice of Parallel Programming, PPoPP, 2024. Best Paper Award!
17th USENIX Symposium on Operating Systems Design and Implementation (Poster), OSDI, 2023
60th Design Automation Conference, DAC, 2023
Design, Automation and Test in Europe Conference, DATE, 2024
Proceedings of Machine Learning and Systems, MLSYS, 2023
Projects
-
Microsoft BitBLAS, 2024 Lead!
BitBLAS is a library to support mixed-precision BLAS operations on GPUs, for example, the
WwdtypeAadtype
mixed-precision matrix multiplication whereCcdtype[M, N] = Aadtype[M, K] × Wwdtype[N, K]
. BitBLAS aims to support efficient mixed-precision DNN model deployment, especially theWwdtypeAadtype
quantization in large language models (LLMs), for example: -
FPGA Accelerator for Digital Recognition, 2020
Utilizing FPGA technology, this project aims to provide accelerated digital analysis during the time when convolutional neural networks began to gain prominence. This enhancement allowed for faster and more efficient digital recognition processes.
[ Watch the Video ] -
FPGA Accelerator for Beam Forming, 2020
With the aim of identifying sound location, this FPGA accelerator leverages a tetragonal microphone array to enhance sounds from specific points, we named the project FOSDA.
[ Watch the Video ] -
Full Stack FPGA Implementation of NVDLA, 2021
This project involved a full-stack FPGA implementation of the open-source Deep Learning Accelerator Framework, NVDLA. To enhance the utility of this accelerator, he designed a new compiler and runtime framework. This allowed networks to do transition between CPU fallbacks and hardware acceleration, ensuring optimal performance and usability.
[ Read the Post: DLA Deploy ] [ Read the Post: Compiler Design ] [ View Github ] -
Opensource Contributions [Github]
Familar with Microsoft NNFusion, Apache TVM, Tengine, etc.
[Invited] Talks
[Slides] [Record] [Tutorials]
[Slides] [Record]
[Slides]
Comments