I’m currently a third year Ph.D student advised by Prof. Ming of EECS-Ling Lab from School of Integrated Circuits, Southeast University.
I received my B.S. and M.S. degrees from Central China Normal University (CCNU) and Nanjing University of Posts and Telecommunications (NJUPT), separately.
My research interest includes domain specific accelerator, efficient machine learning and parallel computing.
🔥 News
- 2025.11: 🎉🎉 Equicore: Accelerating Clebsch-Gordan Tensor Product of Equivariant Neural Networks on FPGA is accepted by DATE’26!
- 2025.09: 🎉🎉 Diff-DiT has been selected as the Best Paper Award Candidate in ICCAD’25!
- 2025.07: 🎉🎉 Diff-DiT: Temporal Differential Accelerator for Low-bit Diffusion Transformers on FPGA has been accepted in The 2025 International Conference on Computer-Aided Design (ICCAD’25).
- 2025.05: 🎉🎉 Diff-Acc: An Efficient FPGA Accelerator for Unconditional Diffusion Models has been accepted in ACM Transactions on Embedded Computing Systems.
- 2025.05: 🎉🎉 Two papers accepted in GLVLSI’25.
📝 Publications

Diff-DiT: Temporal Differential Accelerator for Low-bit Diffusion Transformers on FPGA
Shidi Tang, Pengwei Zheng, Ruiqi Chen, Yuxuan Lv, Bruno da Silva, Ming Ling
The 2025 International Conference on Computer-Aided Design (ICCAD’25).
- This work presents Diff-DiT, an efficient FPGA accelerator for DiT models with temporal differential computing. First, we propose an approximated differential attention to mitigate the previous challenge of differentiating the Attention layer. Second, we propose a cross-cast data access pattern that achieves the highest computational intensity when performing matrix multiplications. Third, we optimize the dataflow by exploring the parallelism (with our HCS method) and pipelining. Diff-DiT achieves 1.39× speedup and 5.60× energy efficiency improvement when compared to the NVIDIA V100 GPU.

Ming Ling, Shidi Tang, Ruiqi Chen, Xin Li, Yanxiang Zhu
Frontiers of Information Technology & Electronic Engineering.
- This work presents Vina-FPGA2, an FPGA-based molecular docking acceleration tool. Building upon our previous efforts, Vina-FPGA2 implements an inter-module pipelined design, further accelerating the Vina computation process. Additionally, we developed a reinforcement learning based rapid solver that allows users to quickly obtain deployment parameters tailored to their target FPGA. Vina-FPGA2-Enhanced achieves an average 12.6× performance improvement over the CPU and a 3.3× improvement over Vina-FPGA. Compared to Vina-GPU, Vina-FPGA2 achieves a 7.2× enhancement in energy efficiency.

Diff-Acc: An Efficient FPGA Accelerator for Unconditional Diffusion Models
Shidi Tang, Ruiqi Chen, Rui Liu, Yuxuan Lv, Pengwei Zheng, He Li, Ming Ling
ACM Transactions on Embedded Computing Systems (Early Access).
- This work presents Diff-Acc, an efficient FPGA accelerator for unconditional UNet-based diffusion models with noval step-wise quantization method and group-wise parallelism. Compared with both server-based (Tesla V100 and Intel Xeon) and edge-based (Raspberry Pi 4 and Jetson Nano) platforms, Diff-Acc implemented on the Zynq UltraScale+ XCZU9EG FPGA demonstrates an up-to 12.5 × energy efficiency. Particularly versus edge-based platforms, Diff-Acc achieves up to 10.26 × and 1.97 × performance improvements over CPU and GPU, respectively.

EEVS: Redeploying Discarded Smartphones for Economic and Ecological Drug Molecules Virtual Screening
Ming Ling, Chuanzhao Zhang, Shidi Tang, Ruiqi Chen, Yanxiang Zhu
IEEE Transactions on Sustainable Computing.
- This work presents EEVS, aimed at redeploying discarded smartphones for economic and ecological drug molecules virtual screening.

Shidi Tang, Ji Ding, Xiangyu Zhu, Zheng Wang, Haitao Zhao, Jiansheng Wu
IEEE/ACM Transactions on Computational Biology and Bioinformatics.
- This work presents Vina-GPU 2.1
, aimed at enhancing the docking speed and precision of AutoDock Vina and its derivatives through the integration of novel algorithms to facil-itate improved docking and virtual screening outcomes.

Shidi Tang, Xingxing Zhou, Ming Ling.
PACRIM’2024.
- This work proposes first SystemC model for equivariant neural networks, specifically targeting Diffdock, with the aim of accelerating its performance using customized hardware such as the FPGA..

Predicting equilibrium distributions for molecular systems with deep learning
Shuxin Zheng, Jiyan He, Chang Liu, Yu Shi, Ziheng Lu, Weitao Feng, Fusong Ju, Jiaxi Wang, Jianwei Zhu, Yaosen Min, He Zhang, Shidi Tang, Hongxia Hao, Peiran Jin, Chi Chen, Frank Noé, Haiguang Liu, Tie-Yan Liu.
Nature Machine Intelligence (2024): 1-10.
- This work proposes Distributional Graphormer (DiG) in an attempt to predict the equilibrium distribution of molecular systems.

Ming Ling, Zhihao Feng, Ruiqi Chen, Yi Shao, Shidi Tang, Yanxiang Zhu
IEEE Transactions on Biomedical Circuits and Systems (TBCAS) (2024).
- This work presents Vina-FPGA-cluster, a multi-FPGA-based molecular docking tool enabling high-accuracy and multi-level parallel Vina acceleration.

Vina-GPU 2.0: further accelerating AutoDock Vina and its derivatives with graphics processing units
Ji Ding, Shidi Tang, Zheming Mei, Lingyue Wang, Qinqin Huang, Haifeng Hu, Ming Ling, Jiansheng Wu
Journal of Chemical Information and Modeling (JCIM) 63 (7), 1982-1998
- This work presents Vina-GPU 2.0
to further accelerate AutoDock Vina and its derivatives with graphics processing units.
Xingxing Zhou, Ming Ling, Qingde Lin, Shidi Tang, Jiansheng Wu, Haifeng Hu
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 2023
- This work presents a probabilistic model for the effectiveness of multiple initial states parallel SA (MISPSA) that is adapted in Vina-GPU.

Accelerate autodock vina with GPUs
Shidi Tang, Ruiqi Chen, Mengru Lin, Qingde Lin, Yanxiang Zhu, Ji Ding, Haifeng Hu, Ming Ling, Jiansheng Wu
Molecules, 2022, 27(9): 3041
- This work presents a parallel algorithm called Vina-GPU
and the OpenCL implementation on GPUs.
A fast approximate check polytope projection algorithm for ADMM decoding of LDPC codes
Qiaoqiao Xia, Yan Lin, Shidi Tang, Qinglin Zhang
IEEE Communications Letters, 2019, 23(9): 1520-1523.
- This work presents a fast projection algorithm for ADMM decoding of LDPC codes
💼 Service
- Reviewer of IEEE Transactions on Sustainable Computing
- Reviewer of The Journal of Supercomputing
🎖 Honors and Awards
- 2025 Best Paper Award Candidate, International Conference on Computer-Aided Design (ICCAD)
- 2025 Gold Award, The 2nd Global Digital Intelligience Education Innovation Competition
- 2025 Zhangjiang High-Tech Scholarship
- 2023 Stars of Tomorrow Internship Program Award, Microsoft Research Asia (MSRA)
📖 Educations
-
B.S. in Communication Engineering, 2016-2019
Central China Normal University (CCNU), Wuhan, China
-
M.S. in Biomedical Engineering, 2020-2023
Nanjing University of Posts and Telecommunications (NJUPT), Nanjing, China
-
Ph.D student, 2023-present
Southeast University (SEU), Nanjing, China
💻 Internships
-
Research Intern, 2022-2023
Microsoft Research Asia (MSRA), AI for Science group, Beijing, China