Memory-aware Scheduling for Complex Wired Networks with Iterative Graph Optimization
Zhong, Shuzhang, Li, Meng, Liang, Yun, Wang, Runsheng, Huang, Ru
Published in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD) (28.10.2023)
Published in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD) (28.10.2023)
Get full text
Conference Proceeding
PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization
Xu, Tianshi, Zhong, Shuzhang, Zeng, Wenxuan, Wang, Runsheng, Li, Meng
Published in arXiv.org (12.10.2024)
Published in arXiv.org (12.10.2024)
Get full text
Paper
Journal Article
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference
Zhong, Shuzhang, Liang, Ling, Wang, Yuan, Wang, Runsheng, Huang, Ru, Li, Meng
Published in arXiv.org (19.08.2024)
Published in arXiv.org (19.08.2024)
Get full text
Paper
Journal Article
ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Zhong, Shuzhang, Yang, Zebin, Li, Meng, Gong, Ruihao, Wang, Runsheng, Huang, Ru
Year of Publication 20.02.2024
Year of Publication 20.02.2024
Get full text
Journal Article
ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Zhong, Shuzhang, Yang, Zebin, Li, Meng, Gong, Ruihao, Wang, Runsheng, Huang, Ru
Published in arXiv.org (21.02.2024)
Get full text
Published in arXiv.org (21.02.2024)
Paper