Improving Numerical Reproducibility of Scientific Software in Parallel Systems

Recently, numerical reproducibility has received increased emphasis from the scientific community. Software results that are not reproducible make it difficult to examine the science the software supports. A common source of numerical reproducibility errors in computational science occurs during flo...

Full description

Saved in:
Bibliographic Details
Published in2020 IEEE International Conference on Electro Information Technology (EIT) pp. 066 - 074
Main Authors Jalal Apostal, Sara Faraji, Apostal, David, Marsh, Ronald
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2020
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Recently, numerical reproducibility has received increased emphasis from the scientific community. Software results that are not reproducible make it difficult to examine the science the software supports. A common source of numerical reproducibility errors in computational science occurs during floating-point arithmetic. Finite precisions and limited storage for floating-point numbers require computers to truncate and round results of some math operations. As a consequence, an approximate value is stored instead of the exact result. One programming idiom that is not always reproducible is the global sum reduction of a distributed array. Changing the number of compute units changes the order array elements are added together which, in turn, changes the truncation and rounding. This may change the result of individual add operations and the resulting global sum. Therefore, floating-point addition is not always associative. This research has improved the numerical reproducibility of scientific applications on parallel systems. Automating the improvement of reproducibility in scientific software is the innovative contribution of this research. Two reproducible global sum reduction functions have been implemented and packaged in a software library. The automated improving of reproducibility has been done by developing a source code scanner to recognize certain MPI-based global sum reductions that may have reproducibility errors. The scanner replaces those reductions with calls to the library function containing reproducible codes. Reproducibility and performance testing have demonstrated the effectiveness of the system. This will extend the usefulness of legacy software and can lead to faster rates of discovery, and more efficient application of scientists' time.
AbstractList Recently, numerical reproducibility has received increased emphasis from the scientific community. Software results that are not reproducible make it difficult to examine the science the software supports. A common source of numerical reproducibility errors in computational science occurs during floating-point arithmetic. Finite precisions and limited storage for floating-point numbers require computers to truncate and round results of some math operations. As a consequence, an approximate value is stored instead of the exact result. One programming idiom that is not always reproducible is the global sum reduction of a distributed array. Changing the number of compute units changes the order array elements are added together which, in turn, changes the truncation and rounding. This may change the result of individual add operations and the resulting global sum. Therefore, floating-point addition is not always associative. This research has improved the numerical reproducibility of scientific applications on parallel systems. Automating the improvement of reproducibility in scientific software is the innovative contribution of this research. Two reproducible global sum reduction functions have been implemented and packaged in a software library. The automated improving of reproducibility has been done by developing a source code scanner to recognize certain MPI-based global sum reductions that may have reproducibility errors. The scanner replaces those reductions with calls to the library function containing reproducible codes. Reproducibility and performance testing have demonstrated the effectiveness of the system. This will extend the usefulness of legacy software and can lead to faster rates of discovery, and more efficient application of scientists' time.
Author Jalal Apostal, Sara Faraji
Marsh, Ronald
Apostal, David
Author_xml – sequence: 1
  givenname: Sara Faraji
  surname: Jalal Apostal
  fullname: Jalal Apostal, Sara Faraji
  organization: University of North Dakota,Department of Computer Science
– sequence: 2
  givenname: David
  surname: Apostal
  fullname: Apostal, David
  organization: University of North Dakota,Department of Computer Science
– sequence: 3
  givenname: Ronald
  surname: Marsh
  fullname: Marsh, Ronald
  organization: University of North Dakota,Department of Computer Science
BookMark eNotj21LwzAUhaMouM39AhHyB1rvTdIm-ShjzsGYYufnkWY3EunLaDtl_96C-3Tg8JwHzpTdNG1DjD0ipIhgn5brnTLW2lSAgNQKMFKaKzZFLQxmErW6ZhOBmUpAannH5n3_DQDjNLfCTNh2XR-79ic2X3x7qqmL3lX8g8bucPKxjFUczrwNvPCRmiGG6HnRhuHXdcRjw99d56qKKl6c-4Hq_p7dBlf1NL_kjH2-LHeL12TztlovnjdJFBkMSWY9ZoBl7kqhQkmlUuCsUEoYbURO3gR90JkesZDnOgRpypwQvUSJoIScsYd_bySi_bGLtevO-8t9-QfSdVDw
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/EIT48999.2020.9208338
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1728153174
9781728153179
EISSN 2154-0373
EndPage 074
ExternalDocumentID 9208338
Genre orig-research
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i250t-59c1501b6ab24fbeb440a9244287826ec8f7d7579c1f667ff38b6e11c31310423
IEDL.DBID RIE
IngestDate Wed Jun 26 19:26:31 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i250t-59c1501b6ab24fbeb440a9244287826ec8f7d7579c1f667ff38b6e11c31310423
OpenAccessLink https://commons.und.edu/cgi/viewcontent.cgi?article=4097&context=theses
PageCount 9
ParticipantIDs ieee_primary_9208338
PublicationCentury 2000
PublicationDate 2020-July
PublicationDateYYYYMMDD 2020-07-01
PublicationDate_xml – month: 07
  year: 2020
  text: 2020-July
PublicationDecade 2020
PublicationTitle 2020 IEEE International Conference on Electro Information Technology (EIT)
PublicationTitleAbbrev EIT
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001096928
Score 1.8130394
Snippet Recently, numerical reproducibility has received increased emphasis from the scientific community. Software results that are not reproducible make it difficult...
SourceID ieee
SourceType Publisher
StartPage 066
SubjectTerms Codes
Parallel programming
Reproducibility of results
Runtime
Software
Software algorithms
Software libraries
Title Improving Numerical Reproducibility of Scientific Software in Parallel Systems
URI https://ieeexplore.ieee.org/document/9208338
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3Na8IwFA_qaadt6Ng3Oey41H7kq-ehuIEiTMGbJDEBmdQhLbL99XtJO2Vjh11KG1IaEtL3ey-_33sIPRgjqJOSEqZzRnyxbaKUMYTDxWYrFmfG653HEz6a05cFW7TQ40ELY60N5DMb-dtwlr_amsqHyvp5CoAhk23UlnFaa7WO8RTA4nkqG5EOPPUHzzMK3oRXo6Rx1Lz7o4hKsCHDUzT-_npNHXmLqlJH5vNXYsb_Du8M9Y5qPTw92KFz1LJFF00O0QI8qepTmQ0GtB0SvNaM2A-8dTjs7cAXwq_wR96rncXrAk_VzhdZ2eAmo3kPzYeD2dOINLUTyBpATUlYbgDqJZornVKnraY0VuBreQ8JPAprpBMrwQR0c5wL5zKpuU0SkyUA-ABjXaBOsS3sJcLcy1G14i5WkqY6k9ZoAX2VoY4ZIa5Q18_F8r1Oj7FspuH67-YbdOLXo2a83qJOuavsHdj1Ut-HBf0CWAOjfg
link.rule.ids 310,311,786,790,795,796,802,23958,23959,25170,27956,55107
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH7MedCTyib-NgePtuuPNEnPsrHpVgZusNtIsgSGY5PRIvrX-9LWieLBS2lDSkNC-r738n3vAdxpzakVgnqJShPPFdv2pNTaY3gx8SIJYu30zqOM9af0cZbMGnC_08IYY0rymfHdbXmWv9jowoXKOmmEgCEWe7CPdj7glVrrO6KCaDyNRC3TwadOdzCh6E84PUoU-PXbP8qolFakdwSjr-9X5JEXv8iVrz9-pWb87wCPof2t1yPjnSU6gYZZtyDbxQtIVlTnMiuCeLtM8VpxYt_JxpJyd5eMIfKM_-Q3uTVkuSZjuXVlVlakzmnehmmvO3noe3X1BG-JsCb3klQj2AsVkyqiVhlFaSDR23I-EvoURgvLFzzh2M0yxq2NhWImDHUcIuRDlHUKzfVmbc6AMCdIVZLZQAoaqVgYrTj2lZraRHN-Di03F_PXKkHGvJ6Gi7-bb-GgPxkN58NB9nQJh25tKv7rFTTzbWGu0crn6qZc3E8mW6bS
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2020+IEEE+International+Conference+on+Electro+Information+Technology+%28EIT%29&rft.atitle=Improving+Numerical+Reproducibility+of+Scientific+Software+in+Parallel+Systems&rft.au=Jalal+Apostal%2C+Sara+Faraji&rft.au=Apostal%2C+David&rft.au=Marsh%2C+Ronald&rft.date=2020-07-01&rft.pub=IEEE&rft.eissn=2154-0373&rft.spage=066&rft.epage=074&rft_id=info:doi/10.1109%2FEIT48999.2020.9208338&rft.externalDocID=9208338