Skip to main navigation Skip to search Skip to main content

Understanding time-varying vulnerability accross GPU Program Lifetime

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Time-varying behaviors of GPU program vulnerability could be exploited to reduce overheads for fault-tolerant designs. However, the inherent parallelism and performance overheads for massive fault injection (FI) hindered such assessments using FI. NVBitFI, a GPU FI tool featuring high-performance and good compatibility, allows time-varying vulnerability evaluations using FI within a reasonable time. We extended NVBitFI to control FI tests on the temporal dimension. A scalable workflow characterizing the time-varying vulnerability of GPU programs at two granularities is presented. A convenient approach to profile vulnerability with actual GPU time is also proposed. Results obtained from 60K fault injections demonstrated the feasibility of the proposed methodologies. A case study exploring the improved instruction-level grouping is presented. More than 340K faults are injected into the vectorAdd kernel to show the possibility to generalize the time-varying behavior of smaller inputs to realistic workloads with large inputs.
Original languageEnglish
Title of host publication35th IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2022
Volume2022-
DOIs
StatePublished - 2022

Fingerprint

Dive into the research topics of 'Understanding time-varying vulnerability accross GPU Program Lifetime'. Together they form a unique fingerprint.

Cite this