Resiliency of automotive object detection networks on GPU architectures
Lotfi, Atieh, Hukerikar, Saurabh, Balasubramanian, Keshav, Racunas, Paul, Saxena, Nirmal, Bramley, Richard, Huang, Yanxiang
Published in 2019 IEEE International Test Conference (ITC) (01.11.2019)
Published in 2019 IEEE International Test Conference (ITC) (01.11.2019)
Get full text
Conference Proceeding
Runtime Fault Diagnostics for GPU Tensor Cores
Hukerikar, Saurabh, Saxena, Nirmal
Published in 2022 IEEE International Test Conference (ITC) (01.09.2022)
Published in 2022 IEEE International Test Conference (ITC) (01.09.2022)
Get full text
Conference Proceeding
Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale
Engelmann, Christian, Hukerikar, Saurabh
Published in Supercomputing frontiers and innovations (01.09.2017)
Published in Supercomputing frontiers and innovations (01.09.2017)
Get full text
Journal Article
PLEXUS: A Pattern-Oriented Runtime System Architecture for Resilient Extreme-Scale High-Performance Computing Systems
Hukerikar, Saurabh, Engelmann, Christian
Published in 2020 IEEE 25th Pacific Rim International Symposium on Dependable Computing (PRDC) (01.12.2020)
Published in 2020 IEEE 25th Pacific Rim International Symposium on Dependable Computing (PRDC) (01.12.2020)
Get full text
Conference Proceeding
Characterizing and Mitigating Soft Errors in GPU DRAM
Sullivan, Michael B., Saxena, Nirmal R., O'Connor, Mike, Lee, Donghyuk, Racunas, Paul, Hukerikar, Saurabh, Tsai, Timothy, Hari, Siva Kumar Sastry, Keckler, Stephen W.
Published in IEEE MICRO (01.07.2022)
Published in IEEE MICRO (01.07.2022)
Get full text
Journal Article
Language Support for Reliable Memory Regions
Hukerikar, Saurabh, Engelmann, Christian
Published in Languages and Compilers for Parallel Computing (2017)
Published in Languages and Compilers for Parallel Computing (2017)
Get full text
Book Chapter
Optimizing Large-Scale Fault Injection Experiments through Martingale Hypothesis: A Systematic Approach for Reliability Assessment of Safety-Critical Systems
Hukerikar, Saurabh, Lotfi, Atieh, Huang, Yanxiang, Campbell, Jason, Saxena, Nirmal
Published in 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S) (24.06.2024)
Published in 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S) (24.06.2024)
Get full text
Conference Proceeding
Shrink or Substitute: Handling Process Failures in HPC Systems Using In-Situ Recovery
Ashraf, Rizwan A., Hukerikar, Saurabh, Engelmann, Christian
Published in 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) (01.03.2018)
Published in 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) (01.03.2018)
Get full text
Conference Proceeding
Havens: Explicit reliable memory regions for HPC applications
Hukerikar, Saurabh, Engelmann, Christian
Published in 2016 IEEE High Performance Extreme Computing Conference (HPEC) (01.09.2016)
Published in 2016 IEEE High Performance Extreme Computing Conference (HPEC) (01.09.2016)
Get full text
Conference Proceeding