A Scalable, Non-Parametric Method for Detecting Performance Anomaly in Large Scale Computing
Yu, Li, Lan, Zhiling
Published in IEEE transactions on parallel and distributed systems (01.07.2016)
Published in IEEE transactions on parallel and distributed systems (01.07.2016)
Get full text
Journal Article
Adaptive Fault Management of Parallel Applications for High-Performance Computing
Zhiling Lan, Zhiling Lan, Yawei Li, Yawei Li
Published in IEEE transactions on computers (01.12.2008)
Published in IEEE transactions on computers (01.12.2008)
Get full text
Journal Article
Toward Automated Anomaly Identification in Large-Scale Systems
Lan, Zhiling, Zheng, Ziming, Li, Yawei
Published in IEEE transactions on parallel and distributed systems (01.02.2010)
Published in IEEE transactions on parallel and distributed systems (01.02.2010)
Get full text
Journal Article
Trade-Off Between Prediction Accuracy and Underestimation Rate in Job Runtime Estimates
Yuping Fan, Rich, Paul, Allcock, William E., Papka, Michael E., Zhiling Lan
Published in 2017 IEEE International Conference on Cluster Computing (CLUSTER) (01.09.2017)
Published in 2017 IEEE International Conference on Cluster Computing (CLUSTER) (01.09.2017)
Get full text
Conference Proceeding
Exploring void search for fault detection on extreme scale systems
Berrocal, Eduardo, Li Yu, Wallace, Sean, Papka, Michael E., Zhiling Lan
Published in 2014 IEEE International Conference on Cluster Computing (CLUSTER) (01.09.2014)
Published in 2014 IEEE International Conference on Cluster Computing (CLUSTER) (01.09.2014)
Get full text
Conference Proceeding
Job scheduling with adjusted runtime estimates on production supercomputers
WEI TANG, DESAI, Narayan, BUETTNER, Daniel, ZHILING LAN
Published in Journal of parallel and distributed computing (01.07.2013)
Published in Journal of parallel and distributed computing (01.07.2013)
Get full text
Conference Proceeding
Journal Article
Improving Batch Scheduling on Blue Gene/Q by Relaxing Network Allocation Constraints
Zhou Zhou, Xu Yang, Zhiling Lan, Rich, Paul, Wei Tang, Morozov, Vitali, Desai, Narayan
Published in IEEE transactions on parallel and distributed systems (01.11.2016)
Published in IEEE transactions on parallel and distributed systems (01.11.2016)
Get full text
Journal Article
I/O-Aware Batch Scheduling for Petascale Computing Systems
Zhou Zhou, Xu Yang, Dongfang Zhao, Rich, Paul, Wei Tang, Jia Wang, Zhiling Lan
Published in 2015 IEEE International Conference on Cluster Computing (01.09.2015)
Published in 2015 IEEE International Conference on Cluster Computing (01.09.2015)
Get full text
Conference Proceeding
Preliminary Interference Study About Job Placement and Routing Algorithms in the Fat-Tree Topology for HPC Applications
Peixin Qiao, Xin Wang, Xu Yang, Yuping Fan, Zhiling Lan
Published in 2017 IEEE International Conference on Cluster Computing (CLUSTER) (01.09.2017)
Published in 2017 IEEE International Conference on Cluster Computing (CLUSTER) (01.09.2017)
Get full text
Conference Proceeding
A study of dynamic meta-learning for failure prediction in large-scale systems
Lan, Zhiling, Gu, Jiexing, Zheng, Ziming, Thakur, Rajeev, Coghlan, Susan
Published in Journal of parallel and distributed computing (01.06.2010)
Published in Journal of parallel and distributed computing (01.06.2010)
Get full text
Journal Article
Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities
Xin Wang, Wei Tang, Kettimuttu, Rajkumar, Zhiling Lan
Published in 2015 44th International Conference on Parallel Processing Workshops (01.09.2015)
Published in 2015 44th International Conference on Parallel Processing Workshops (01.09.2015)
Get full text
Conference Proceeding
Fault-aware, utility-based job scheduling on Blue, Gene/P systems
Wei Tang, Zhiling Lan, Desai, N., Buettner, D.
Published in 2009 IEEE International Conference on Cluster Computing and Workshops (01.08.2009)
Published in 2009 IEEE International Conference on Cluster Computing and Workshops (01.08.2009)
Get full text
Conference Proceeding
Study of Intra- and Interjob Interference on Torus Networks
Xu Yang, Jenkins, John, Mubarak, Misbah, Xin Wang, Ross, Robert B., Zhiling Lan
Published in 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS) (01.12.2016)
Published in 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS) (01.12.2016)
Get full text
Conference Proceeding
A Preliminary Study of Intra-Application Interference on Dragonfly Network
Xin Wang, Xu Yang, Mubarak, Misbah, Ross, Robert B., Zhiling Lan
Published in 2017 IEEE International Conference on Cluster Computing (CLUSTER) (01.09.2017)
Published in 2017 IEEE International Conference on Cluster Computing (CLUSTER) (01.09.2017)
Get full text
Conference Proceeding
System log pre-processing to improve failure prediction
Ziming Zheng, Zhiling Lan, Park, B.H., Geist, A.
Published in 2009 IEEE/IFIP International Conference on Dependable Systems & Networks (01.06.2009)
Published in 2009 IEEE/IFIP International Conference on Dependable Systems & Networks (01.06.2009)
Get full text
Conference Proceeding
Balancing job performance with system performance via locality-aware scheduling on torus-connected systems
Xu Yang, Zhou Zhou, Wei Tang, Xingwu Zheng, Jia Wang, Zhiling Lan
Published in 2014 IEEE International Conference on Cluster Computing (CLUSTER) (01.09.2014)
Published in 2014 IEEE International Conference on Cluster Computing (CLUSTER) (01.09.2014)
Get full text
Conference Proceeding