Efficient data race detection for distributed memory parallel programs

In this paper we present a precise data race detection technique for distributed memory parallel programs. Our technique, which we call Active Testing, builds on our previous work on race detection for shared memory Java and C programs and it handles programs written using shared memory approaches a...

Full description

Saved in:
Bibliographic Details
Published in2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) pp. 1 - 12
Main Authors Park, Chang-Seo, Sen, Koushik, Hargrove, Paul, Iancu, Costin
Format Conference Proceeding
LanguageEnglish
Published New York, NY, USA ACM 12.11.2011
IEEE
SeriesACM Conferences
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper we present a precise data race detection technique for distributed memory parallel programs. Our technique, which we call Active Testing, builds on our previous work on race detection for shared memory Java and C programs and it handles programs written using shared memory approaches as well as bulk communication. Active testing works in two phases: in the first phase, it performs an imprecise dynamic analysis of an execution of the program and finds potential data races that could happen if the program is executed with a different thread schedule. In the second phase, active testing re-executes the program by actively controlling the thread schedule so that the data races reported in the first phase can be confirmed. A key highlight of our technique is that it can scalably handle distributed programs with bulk communication and single- and splitphase barriers. Another key feature of our technique is that it is precise---a data race confirmed by active testing is an actual data race present in the program; however, being a testing approach, our technique can miss actual data races. We implement the framework for the UPC programming language and demonstrate scalability up to a thousand cores for programs with both fine-grained and bulk (MPI style) communication. The tool confirms previously known bugs and uncovers several unknown ones. Our extensions capture constructs proposed in several modern programming languages for High Performance Computing, most notably non-blocking barriers and collectives.
ISBN:145030771X
9781450307710
ISSN:2167-4329
2167-4337
DOI:10.1145/2063384.2063452