Landing Stencil Code on Godson-T

The advent of multi-core/many-core chip technology offers both an extraordinary opportunity and a profound challenge. In particular, computer architects and system software designers are faced with a unique opportunity to introducing new architecture features as well as adequate compiler technology...

Full description

Saved in:
Bibliographic Details
Published inJournal of computer science and technology Vol. 25; no. 4; pp. 886 - 894
Main Author 崔慧敏 王蕾 范东睿 冯晓兵
Format Journal Article
LanguageEnglish
Published Boston Springer US 01.07.2010
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN1000-9000
1860-4749
DOI10.1007/s11390-010-9373-6

Cover

Abstract The advent of multi-core/many-core chip technology offers both an extraordinary opportunity and a profound challenge. In particular, computer architects and system software designers are faced with a unique opportunity to introducing new architecture features as well as adequate compiler technology -- together they may have profound impact. This paper presents a case study (using the 1-D Jacobi computation) of compiler-amendable performance optimization techniques on a many-core architecture Godson-T. Godson-T architecture has several unique features that are chosen for this study: 1) chip-level global addressable memory in particular the scratchpad memories (SPM) local to the processing cores; 2) fine-grain memory based synchronization (e.g., full-empty bit for fine-grain synchronization). Leveraging state-of-the-art performance optimization methods for 1-D stencil parallelization (e.g., timed tiling and variants), we developed and implement a number of many-core-based optimization for Godson-T. Our experimental study shows good performance in both execution time speedup and scalability, validate the value of globally accessed SPM and fine-grain synchronization mechanism (full-empty bits) under the Godson-T, and provides some useful guidelines for future compiler technology of many-core chip architectures.
AbstractList The advent of multi-core/many-core chip technology offers both an extraordinary opportunity and a profound challenge. In particular, computer architects and system software designers are faced with a unique opportunity to introducing new architecture features as well as adequate compiler technology - together they may have profound impact. This paper presents a case study (using the 1-D Jacobi computation) of compiler-amendable performance optimization techniques on a many-core architecture Godson-T. Godson-T architecture has several unique features that are chosen for this study: 1) chip-level global addressable memory in particular the scratchpad memories (SPM) local to the processing cores; 2) fine-grain memory based synchronization (e.g., full-empty bit for fine-grain synchronization). Leveraging state-of-the-art performance optimization methods for 1-D stencil parallelization (e.g., timed tiling and variants), we developed and implement a number of many-core-based optimization for Godson-T. Our experimental study shows good performance in both execution time speedup and scalability, validate the value of globally accessed SPM and fine-grain synchronization mechanism (full-empty bits) under the Godson-T, and provides some useful guidelines for future compiler technology of many-core chip architectures.
The advent of multi-core/many-core chip technology offers both an extraordinary opportunity and a profound challenge. In particular, computer architects and system software designers are faced with a unique opportunity to introducing new architecture features as well as adequate compiler technology -- together they may have profound impact. This paper presents a case study (using the 1-D Jacobi computation) of compiler-amendable performance optimization techniques on a many-core architecture Godson-T. Godson-T architecture has several unique features that are chosen for this study: 1) chip-level global addressable memory in particular the scratchpad memories (SPM) local to the processing cores; 2) fine-grain memory based synchronization (e.g., full-empty bit for fine-grain synchronization). Leveraging state-of-the-art performance optimization methods for 1-D stencil parallelization (e.g., timed tiling and variants), we developed and implement a number of many-core-based optimization for Godson-T. Our experimental study shows good performance in both execution time speedup and scalability, validate the value of globally accessed SPM and fine-grain synchronization mechanism (full-empty bits) under the Godson-T, and provides some useful guidelines for future compiler technology of many-core chip architectures.[PUBLICATION ABSTRACT]
The advent of multi-core/many-core chip technology offers both an extraordinary opportunity and a profound challenge. In particular, computer architects and system software designers are faced with a unique opportunity to introducing new architecture features as well as adequate compiler technology -- together they may have profound impact. This paper presents a case study (using the 1-D Jacobi computation) of compiler-amendable performance optimization techniques on a many-core architecture Godson-T. Godson-T architecture has several unique features that are chosen for this study: 1) chip-level global addressable memory in particular the scratchpad memories (SPM) local to the processing cores; 2) fine-grain memory based synchronization (e.g., full-empty bit for fine-grain synchronization). Leveraging state-of-the-art performance optimization methods for 1-D stencil parallelization (e.g., timed tiling and variants), we developed and implement a number of many-core-based optimization for Godson-T. Our experimental study shows good performance in both execution time speedup and scalability, validate the value of globally accessed SPM and fine-grain synchronization mechanism (full-empty bits) under the Godson-T, and provides some useful guidelines for future compiler technology of many-core chip architectures.
Author 崔慧敏 王蕾 范东睿 冯晓兵
AuthorAffiliation Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China Graduate University of Chinese Academy of Sciences, Beijing 100039, China
Author_xml – sequence: 1
  fullname: 崔慧敏 王蕾 范东睿 冯晓兵
BookMark eNp9kE1LAzEQhoMo2FZ_gLfFi6fo5GOTzVGKVqHgwXoOm9ls3bpN2k178N-b0oLgwcvMwDzvMDxjch5i8ITcMLhnAPohMSYMUGBAjdCCqjMyYpUCKrU053kGyJtcLsk4pRWA0CDliBTzOjRdWBbvOx-w64tpbHwRQzGLTYqBLq7IRVv3yV-f-oR8PD8tpi90_jZ7nT7OKQqudtRxIaVSslFVjQzbSmleSldqAbyRbdsq4VSL6EvXOFei8QYB0aEABAG1mJC7493NELd7n3Z23SX0fV8HH_fJGtBGC62rTN7-IVdxP4T8nK00ByMrwTLEjhAOMaXBt3YzdOt6-LYM7MGYPRqz2Zg9GLMqZ_gxkzIbln74Pfxf6PQNfsaw3OacdTV-tV3vbVaigSsufgAAvXit
Cites_doi 10.1109/IPDPS.2000.845979
10.1109/PDCAT.2008.28
10.1109/ISCA.1998.694790
10.1109/40.127581
10.1109/IPDPS.2007.370291
10.1109/SC.2008.5222004
10.1007/s11227-007-0111-y
10.1137/070693199
10.1109/ICCD.2006.4380784
10.1145/1531743.1531756
10.1145/1345206.1345210
10.1109/MM.2005.37
10.1145/1250662.1250668
10.1007/978-3-540-85451-7_14
10.1007/11823285_14
10.1145/255129.255132
10.1145/1273442.1250761
10.1109/ISSCC.2007.373606
10.1145/1048935.1050187
10.1145/1345206.1345255
10.1145/209936.209952
10.1145/1178597.1178605
10.1145/1360612.1360617
10.1145/113446.113449
10.1145/301618.301668
10.1109/IPDPS.2007.370639
ContentType Journal Article
Copyright Springer 2010
Springer 2010.
Copyright_xml – notice: Springer 2010
– notice: Springer 2010.
DBID 2RA
92L
CQIGP
W92
~WA
AAYXX
CITATION
3V.
7SC
7WY
7WZ
7XB
87Z
8AL
8FD
8FE
8FG
8FK
8FL
ABJCF
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BEZIV
BGLVJ
CCPQU
DWQXO
FRNLG
F~G
GNUQQ
HCIFZ
JQ2
K60
K6~
K7-
L.-
L6V
L7M
L~C
L~D
M0C
M0N
M7S
P5Z
P62
PHGZM
PHGZT
PKEHL
PQBIZ
PQBZA
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
Q9U
DOI 10.1007/s11390-010-9373-6
DatabaseName 维普_期刊
中文科技期刊数据库-CALIS站点
中文科技期刊数据库-7.0平台
中文科技期刊数据库-工程技术
中文科技期刊数据库- 镜像站点
CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
ABI/INFORM Collection
ABI/INFORM Global (PDF only)
ProQuest Central (purchase pre-March 2016)
ABI/INFORM Global (Alumni Edition)
Computing Database (Alumni Edition)
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ABI/INFORM Collection (Alumni Edition)
ProQuest Materials Science & Engineering
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Business Premium Collection (Proquest)
Technology collection
ProQuest One Community College
ProQuest Central Korea
Business Premium Collection (Alumni)
ABI/INFORM Global (Corporate)
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
ProQuest Business Collection (Alumni Edition)
ProQuest Business Collection
Computer Science Database
ABI/INFORM Professional Advanced
ProQuest Engineering Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ABI/INFORM Global
Computing Database
Engineering Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest One Academic Middle East (New)
ProQuest One Business (UW System Shared)
ProQuest One Business (Alumni)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
ProQuest Central Basic
DatabaseTitle CrossRef
ABI/INFORM Global (Corporate)
ProQuest Business Collection (Alumni Edition)
ProQuest One Business
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ABI/INFORM Complete
ProQuest Central
ABI/INFORM Professional Advanced
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest Central Korea
ProQuest Central (New)
Advanced Technologies Database with Aerospace
ABI/INFORM Complete (Alumni Edition)
Engineering Collection
Advanced Technologies & Aerospace Collection
Business Premium Collection
ABI/INFORM Global
ProQuest Computing
Engineering Database
ABI/INFORM Global (Alumni Edition)
ProQuest Central Basic
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Business Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
ProQuest One Business (Alumni)
ProQuest One Academic
ProQuest Central (Alumni)
ProQuest One Academic (New)
Business Premium Collection (Alumni)
DatabaseTitleList Computer and Information Systems Abstracts
ABI/INFORM Global (Corporate)


Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
Architecture
DocumentTitleAlternate Landing Stencil Code on Godson-T
EISSN 1860-4749
EndPage 894
ExternalDocumentID 2376505781
10_1007_s11390_010_9373_6
34470262
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
.86
.VR
06D
0R~
0VY
1N0
1SB
2.D
28-
29K
2B.
2C0
2J2
2JN
2JY
2KG
2KM
2LR
2RA
2VQ
2~H
30V
3V.
4.4
406
408
409
40D
40E
5GY
5QI
5VR
5VS
67Z
6NX
7WY
8FE
8FG
8FL
8TC
8UJ
92H
92I
92L
92R
93N
95-
95.
95~
96X
AAAVM
AABHQ
AABYN
AAFGU
AAHNG
AAIAL
AAJKR
AANZL
AAOBN
AARHV
AARTL
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
ABBBX
ABBXA
ABDZT
ABECU
ABFGW
ABFTD
ABFTV
ABHLI
ABHQN
ABJOX
ABKAS
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABSXP
ABTEG
ABTHY
ABTMW
ABULA
ABUWG
ABXPI
ACBMV
ACBRV
ACBXY
ACGFS
ACHSB
ACHXU
ACIGE
ACIPQ
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACSNA
ACTTH
ACVWB
ACWMK
ADGRI
ADHHG
ADHIR
ADINQ
ADKNI
ADKPE
ADMDM
ADRFC
ADTIX
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFTE
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AENEX
AEOHA
AEPYU
AESTI
AETLH
AEVTX
AEXYK
AEYWE
AFEXP
AFGCZ
AFKRA
AFLOW
AFQWF
AFUIB
AFWTZ
AFZKB
AGAYW
AGDGC
AGGBP
AGGDS
AGJBK
AGMZJ
AGQMX
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIIXL
AILAN
AIMYW
AITGF
AJBLW
AJDOV
AJRNO
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMYLF
AMYQR
ARAPS
ARMRJ
ASPBG
AVWKF
AXYYD
AZFZN
AZQEC
B-.
BA0
BBWZM
BDATZ
BENPR
BEZIV
BGLVJ
BGNMA
BPHCQ
CAG
CCEZO
CDYEO
CHBEP
COF
CQIGP
CS3
CSCUP
CUBFJ
CW9
D-I
DNIVK
DU5
DWQXO
EBLON
EBS
EIOEI
EJD
ESBYG
F5P
FA0
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRNLG
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ6
GQ7
GQ8
GROUPED_ABI_INFORM_COMPLETE
GXS
H13
HCIFZ
HF~
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
IAO
IHE
IJ-
IPNFZ
IXC
IXD
IXE
IZIGR
IZQ
I~X
I~Z
J-C
JBSCW
JCJTX
JZLTJ
K60
K6V
K6~
K7-
KDC
KOV
LAK
LLZTM
M0C
M0N
M4Y
MA-
N2Q
N95
NB0
NDZJH
NF0
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
P19
P2P
P62
P9O
PF0
PQBIZ
PQEST
PQQKQ
PQUKI
PRINS
PROAC
PT4
PT5
Q2X
QOK
QOS
R4E
R89
R9I
RHV
RNI
RNS
ROL
RPX
RSV
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCL
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNX
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TCJ
TGT
TSG
TSK
TSV
TUC
U2A
UG4
UNUBA
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
W92
WK8
YLTOR
Z7R
Z7U
Z7X
Z81
Z83
Z88
Z8R
Z8W
Z92
ZMTXR
~A9
~EX
~WA
-SI
-S~
5XA
5XJ
AACDK
AAJBT
AASML
AATNV
AAXDM
AAYZH
ABAKF
ABJCF
ABJNI
ABQSL
ABTKH
ABWNU
ACAOD
ACDTI
ACPIV
ACZOJ
ADTPH
AEFQL
AEMSY
AESKC
AEVLU
AFBBN
AGQEE
AGRTI
AIGIU
AMXSW
AOCGG
BSONS
CAJEI
CCPQU
DDRTE
DPUIP
IKXTQ
IWAJR
M7S
NPVJJ
PQBZA
PTHSS
Q--
SNPRN
SOHCF
U1G
U5S
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABRTQ
ACSTC
ADHKG
AEZWR
AFDZB
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CITATION
ICD
IVC
PHGZM
PHGZT
PQGLB
TGMPQ
7SC
7XB
8AL
8FD
8FK
JQ2
L.-
L6V
L7M
L~C
L~D
PKEHL
Q9U
PUEGO
ID FETCH-LOGICAL-c326t-b2344664d68ac1cf867254b57302d4fff63b6fcce5bdbb5c9e9c0ccbc30c030a3
IEDL.DBID U2A
ISSN 1000-9000
IngestDate Thu Sep 04 22:21:22 EDT 2025
Sat Aug 23 14:15:14 EDT 2025
Tue Aug 05 11:59:17 EDT 2025
Fri Feb 21 02:40:03 EST 2025
Fri Nov 25 17:04:00 EST 2022
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords many-core
SPM
fine-grain synchronization
compiler
stencil
Jacobi
Language English
License http://www.springer.com/tdm
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c326t-b2344664d68ac1cf867254b57302d4fff63b6fcce5bdbb5c9e9c0ccbc30c030a3
Notes 11-2296/TP
TP332
many-core, stencil, Jacobi, compiler
SPM, fine-grain synchronization
many-core, stencil, Jacobi, compiler; SPM, fine-grain synchronization
TG76
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
PQID 872094831
PQPubID 326258
PageCount 9
ParticipantIDs proquest_miscellaneous_907973778
proquest_journals_872094831
crossref_primary_10_1007_s11390_010_9373_6
springer_journals_10_1007_s11390_010_9373_6
chongqing_backfile_34470262
PublicationCentury 2000
PublicationDate 2010-07-01
PublicationDateYYYYMMDD 2010-07-01
PublicationDate_xml – month: 07
  year: 2010
  text: 2010-07-01
  day: 01
PublicationDecade 2010
PublicationPlace Boston
PublicationPlace_xml – name: Boston
– name: Beijing
PublicationTitle Journal of computer science and technology
PublicationTitleAbbrev J. Comput. Sci. Technol
PublicationTitleAlternate Journal of Computer Science and Technology
PublicationYear 2010
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References FrigoMStrumpenVThe memory behavior of cache oblivious stencil computationsJournal of Supercomputing200629293112
McCalpin J, Wonnacott D. Time skewing: A value-based approach to optimizing for memory locality. Technical Report DCS-TR-379, DCS, Rugers University, 1999.
Venetis I E, Gao G R. Mapping the LU decomposition on a many core architecture: Challenges and solutions. In Proc. ACM International Conference on Computing Frontiers (CF2009), Ischia, Italy, May 18-20, 2009, pp.71-80.
Dally W J. Computer architecture in the many-core era. In Keynote at the 24th Int. Conf. Comput. Design, San Jose, CA, USA, Oct. 1, 2006.
Kamil S, Datta K, Williams S, Oliker L, Shalf J, Yelick K. Implicit and explicit optimizations for stencil computations. In Proc. MSPC2006, San Jose, USA, Oct. 22, 2006, pp.51-60.
Huang H, Yuan N et al. Architecture supported synchronization-based cache coherence protocol for many-core processors. In the 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects (CMPMSI) of ISCA’08, Beijing, China, June 22, 2008.
AlversonRCallahanDCummingsDKoblenzBPorterfieldASmithBThe Tera computer systemSIGARCH Comput. Archit. News1990183b1610.1145/255129.255132
Datta K, Murphy M, Volkov V, Williams S, Carter J, Oliker L, Patterson D, Shalf J, Yelick K. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proc. SC2008, Austin, USA, Nov. 15-21, 2008, Article No. 1.
Baskaran M, Bondhugula U, Krishnamoorthy S, Ramanujam J, Rountev A, Sadayappan P. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2008), Salt Lake City, USA, Feb. 20-23, 2008, pp.1-10.
Kranz D, Lim B H, Agarwal A. Low-cost support for finegrain synchronization in multiprocessors. Technical Report MIT/LCS/TM-470, Massachusetts Institute of Technology, Cambridge, 1992.
AlversonRCallahanDThe Tera compute systemSIGARCH Comput. Archit. News1990183b1610.1145/255129.255132
Wonnacott D. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In Proc. International Conference on Parallel and Distributed Computing Systems, Cancun, Mexico, May 1-5, 2000, p.171.
Keckler S W, Dally W J, Maskit D, Carter N P, Chang A, Lee W S. Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor. In Proc. the 25th Int. Symp. Computer Architecture, Barcelona, Spain, Jun. 27-Jul. 2, 1998, pp.302-317.
Zhu W, Sreedhar V C, Hu Z, Gao G R. Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures. In Proc. ISCA 2007, San Diego, USA, June 9-13, 2007, pp.35-45.
Tseng C W. Compiler optimizations for eliminating barrier synchronization. In Proc. PPOPP 1995, Santa Barbara, California, USA, July 19-21, 1995, pp.144-155.
Smith B. The Architecture of HEP. Parallel MIMD Computation: HEP Supercomputer and Its Applications. Kowalik J S (ed.), Scientific Computation Series, Cambridge: MIT Press, MA, 1985, p.41-55.
DallyWJThe message-driven processorIEEE Micro.1992122233910.1109/40.127581
MontrymJMoretonHThe GeForce 6800IEEE Micro2005252415110.1109/MM.2005.37
Krishnamoorthy S, Baskaran M, Bondhugula U, Ramanujam J, Rountev A, Sadayappan P. Effective automatic parallelization of stencil computations. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, USA, June 10-13, 2007, pp.235-244.
Cray MTA-2 System, http://www.cray.com/About/History.aspx.
Dally W J, Labonte F, Das A, Hanrahan P, Ahn J H, Gummaraju J, Erez M, Jayasena N, Buck I, Knight T J, Kapasi U J. Merrimac: Supercomputing with Streams. In Proc. the Supercomputer Conference, Phoenix, USA, Nov. 15-21, 2003.
Tan G, Fan D, Zhang J, Russo A, Gao G R. Experience on optimizing irregular computation for memory hierarchy in manycore architecture. In Proc. PPoPP 2008, Salt Lake City, USA, Feb. 14-18, pp.279-280.
Hu Z, del Cuvillo J, Zhu W, Gao G R. Optimization of dense matrix multiplication on IBM Cyclops-64: Challenges and experiences. In Proc. Euro-Par 2006, Dresden, Germany, Aug. 29-Sept. 1, 2006, pp.134-144.
Seiler L, Carmean D, Sprangle E, Forsyth T, AbrashM, Dubey P, Junkins S, Lake A, Sugerman J, Cavin R, Espasa R, Grochowski E, Juan T, Hanrahan P. Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics, 27(3): Article No. 18.
Song Y, Li Z. New tiling techniques to improve cache temporal locality. In Proc. ACM SIGPLAN Conference on Program ming Language Design and Implementation, Atlanta, USA, May 1-4, 1999, pp.215-228.
DattaKKamilSWilliamsSOlikerLShalfJYelickKOptimization and performance modeling of stencil computations on modern microprocessorsSIAM Review200851112915910.1137/070693199
Borkar S Y, Mulder H, Dubey P, Pawlowski S S, Kahn K C, Rattner J R, Kuck D J. Platform 2015: Intel processor and platform evolution for the next decade. Technical Report, Intel White Paper, Mar. 2005.
Renganarayanan L, Harthikote-Matha M, Dewri R, Rajopadhye S V. Towards optimal multi-level tiling for stencil computations. In Proc. IPDPS, Long Beach, USA, Mar. 26-30, 2007, p.101.
Hofstee P. Power efficient architecture and the cell processor. In HPCA-11,Invited Paper and Keynote Speech, San Francisco, USA, Feb. 12-16, 2005.
Haataja J, Savolainen V. Cray T3E User’s Guide. Center for Scientific Computing, Finland, 1997.
Asanovic K, Bodik R, Catanzaro B C, Gebis J J, Husbands P, Keutzer K, Patterson D A, Plishker W L, Shalf J, Williams S W, Yelick K A. The landscape of parallel computing research: A view from Berkeley. UCB/EECS-2006-183, University of California, Berkeley, 2006.
Ye X, Nguyen V H, Lavenier D, Fan D. Efficient parallelization of a protein sequence comparison algorithm on manycore architecture. In Proc. the Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies, Dunedin, New Zealand, Dec. 1-4, 2008, pp.167-170.
Long G, Fan D et al. A performance model of dense matrix operations on many-core architectures. In Proc. Euro-Par 2008, Las Palmas de Gran Canaria, Spain, Aug. 26-29, 2008, pp.120-129.
Xue L, Chen L, Hu Z, Gao G R. Performance Tuning of the Fast Fourier Transform on a Multicore Architecture. CAPSL Technical Memo 81, Feb. 8, 2008.
Vangal S, Howard J, Ruhl G, Dighe S, Wilson H, Tschanz J, Finan D, Iyer P, Singh A, Jacob T, Jain S, Venkataraman S, Hoskote Y, Borkar N. An 80-tile 1.28TFLOPS network-onchip in 65 nm CMOS. In Proc. IEEE International Solid-State Circuits Conference, San Francisco, USA, Feb. 11-15, 2007.
Michael E Wolf, Monica S Lam. A data locality optimizing algorithm. In Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation, Toronto, Canada, Jun. 24-28, 1991, pp.30-44.
9373_CR31
9373_CR10
9373_CR32
9373_CR6
9373_CR11
9373_CR33
9373_CR12
9373_CR34
9373_CR4
9373_CR13
9373_CR35
9373_CR5
9373_CR14
9373_CR36
9373_CR2
9373_CR3
9373_CR16
9373_CR17
9373_CR1
9373_CR18
K Datta (9373_CR15) 2008; 51
M Frigo (9373_CR7) 2006; 29
R Alverson (9373_CR25) 1990; 18
9373_CR21
9373_CR22
9373_CR23
WJ Dally (9373_CR26) 1992; 12
9373_CR24
9373_CR27
9373_CR28
9373_CR29
9373_CR19
J Montrym (9373_CR30) 2005; 25
9373_CR8
9373_CR9
R Alverson (9373_CR20) 1990; 18
References_xml – reference: Dally W J. Computer architecture in the many-core era. In Keynote at the 24th Int. Conf. Comput. Design, San Jose, CA, USA, Oct. 1, 2006.
– reference: Wonnacott D. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In Proc. International Conference on Parallel and Distributed Computing Systems, Cancun, Mexico, May 1-5, 2000, p.171.
– reference: DallyWJThe message-driven processorIEEE Micro.1992122233910.1109/40.127581
– reference: Xue L, Chen L, Hu Z, Gao G R. Performance Tuning of the Fast Fourier Transform on a Multicore Architecture. CAPSL Technical Memo 81, Feb. 8, 2008.
– reference: Hofstee P. Power efficient architecture and the cell processor. In HPCA-11,Invited Paper and Keynote Speech, San Francisco, USA, Feb. 12-16, 2005.
– reference: Zhu W, Sreedhar V C, Hu Z, Gao G R. Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures. In Proc. ISCA 2007, San Diego, USA, June 9-13, 2007, pp.35-45.
– reference: Krishnamoorthy S, Baskaran M, Bondhugula U, Ramanujam J, Rountev A, Sadayappan P. Effective automatic parallelization of stencil computations. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, USA, June 10-13, 2007, pp.235-244.
– reference: Baskaran M, Bondhugula U, Krishnamoorthy S, Ramanujam J, Rountev A, Sadayappan P. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2008), Salt Lake City, USA, Feb. 20-23, 2008, pp.1-10.
– reference: DattaKKamilSWilliamsSOlikerLShalfJYelickKOptimization and performance modeling of stencil computations on modern microprocessorsSIAM Review200851112915910.1137/070693199
– reference: FrigoMStrumpenVThe memory behavior of cache oblivious stencil computationsJournal of Supercomputing200629293112
– reference: Asanovic K, Bodik R, Catanzaro B C, Gebis J J, Husbands P, Keutzer K, Patterson D A, Plishker W L, Shalf J, Williams S W, Yelick K A. The landscape of parallel computing research: A view from Berkeley. UCB/EECS-2006-183, University of California, Berkeley, 2006.
– reference: Dally W J, Labonte F, Das A, Hanrahan P, Ahn J H, Gummaraju J, Erez M, Jayasena N, Buck I, Knight T J, Kapasi U J. Merrimac: Supercomputing with Streams. In Proc. the Supercomputer Conference, Phoenix, USA, Nov. 15-21, 2003.
– reference: Haataja J, Savolainen V. Cray T3E User’s Guide. Center for Scientific Computing, Finland, 1997.
– reference: Ye X, Nguyen V H, Lavenier D, Fan D. Efficient parallelization of a protein sequence comparison algorithm on manycore architecture. In Proc. the Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies, Dunedin, New Zealand, Dec. 1-4, 2008, pp.167-170.
– reference: Michael E Wolf, Monica S Lam. A data locality optimizing algorithm. In Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation, Toronto, Canada, Jun. 24-28, 1991, pp.30-44.
– reference: Vangal S, Howard J, Ruhl G, Dighe S, Wilson H, Tschanz J, Finan D, Iyer P, Singh A, Jacob T, Jain S, Venkataraman S, Hoskote Y, Borkar N. An 80-tile 1.28TFLOPS network-onchip in 65 nm CMOS. In Proc. IEEE International Solid-State Circuits Conference, San Francisco, USA, Feb. 11-15, 2007.
– reference: Seiler L, Carmean D, Sprangle E, Forsyth T, AbrashM, Dubey P, Junkins S, Lake A, Sugerman J, Cavin R, Espasa R, Grochowski E, Juan T, Hanrahan P. Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics, 27(3): Article No. 18.
– reference: Datta K, Murphy M, Volkov V, Williams S, Carter J, Oliker L, Patterson D, Shalf J, Yelick K. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proc. SC2008, Austin, USA, Nov. 15-21, 2008, Article No. 1.
– reference: Cray MTA-2 System, http://www.cray.com/About/History.aspx.
– reference: Tseng C W. Compiler optimizations for eliminating barrier synchronization. In Proc. PPOPP 1995, Santa Barbara, California, USA, July 19-21, 1995, pp.144-155.
– reference: MontrymJMoretonHThe GeForce 6800IEEE Micro2005252415110.1109/MM.2005.37
– reference: Long G, Fan D et al. A performance model of dense matrix operations on many-core architectures. In Proc. Euro-Par 2008, Las Palmas de Gran Canaria, Spain, Aug. 26-29, 2008, pp.120-129.
– reference: Smith B. The Architecture of HEP. Parallel MIMD Computation: HEP Supercomputer and Its Applications. Kowalik J S (ed.), Scientific Computation Series, Cambridge: MIT Press, MA, 1985, p.41-55.
– reference: Borkar S Y, Mulder H, Dubey P, Pawlowski S S, Kahn K C, Rattner J R, Kuck D J. Platform 2015: Intel processor and platform evolution for the next decade. Technical Report, Intel White Paper, Mar. 2005.
– reference: McCalpin J, Wonnacott D. Time skewing: A value-based approach to optimizing for memory locality. Technical Report DCS-TR-379, DCS, Rugers University, 1999.
– reference: Hu Z, del Cuvillo J, Zhu W, Gao G R. Optimization of dense matrix multiplication on IBM Cyclops-64: Challenges and experiences. In Proc. Euro-Par 2006, Dresden, Germany, Aug. 29-Sept. 1, 2006, pp.134-144.
– reference: AlversonRCallahanDCummingsDKoblenzBPorterfieldASmithBThe Tera computer systemSIGARCH Comput. Archit. News1990183b1610.1145/255129.255132
– reference: Keckler S W, Dally W J, Maskit D, Carter N P, Chang A, Lee W S. Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor. In Proc. the 25th Int. Symp. Computer Architecture, Barcelona, Spain, Jun. 27-Jul. 2, 1998, pp.302-317.
– reference: Tan G, Fan D, Zhang J, Russo A, Gao G R. Experience on optimizing irregular computation for memory hierarchy in manycore architecture. In Proc. PPoPP 2008, Salt Lake City, USA, Feb. 14-18, pp.279-280.
– reference: Kranz D, Lim B H, Agarwal A. Low-cost support for finegrain synchronization in multiprocessors. Technical Report MIT/LCS/TM-470, Massachusetts Institute of Technology, Cambridge, 1992.
– reference: Venetis I E, Gao G R. Mapping the LU decomposition on a many core architecture: Challenges and solutions. In Proc. ACM International Conference on Computing Frontiers (CF2009), Ischia, Italy, May 18-20, 2009, pp.71-80.
– reference: Huang H, Yuan N et al. Architecture supported synchronization-based cache coherence protocol for many-core processors. In the 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects (CMPMSI) of ISCA’08, Beijing, China, June 22, 2008.
– reference: AlversonRCallahanDThe Tera compute systemSIGARCH Comput. Archit. News1990183b1610.1145/255129.255132
– reference: Kamil S, Datta K, Williams S, Oliker L, Shalf J, Yelick K. Implicit and explicit optimizations for stencil computations. In Proc. MSPC2006, San Jose, USA, Oct. 22, 2006, pp.51-60.
– reference: Renganarayanan L, Harthikote-Matha M, Dewri R, Rajopadhye S V. Towards optimal multi-level tiling for stencil computations. In Proc. IPDPS, Long Beach, USA, Mar. 26-30, 2007, p.101.
– reference: Song Y, Li Z. New tiling techniques to improve cache temporal locality. In Proc. ACM SIGPLAN Conference on Program ming Language Design and Implementation, Atlanta, USA, May 1-4, 1999, pp.215-228.
– ident: 9373_CR13
  doi: 10.1109/IPDPS.2000.845979
– ident: 9373_CR17
  doi: 10.1109/PDCAT.2008.28
– ident: 9373_CR28
  doi: 10.1109/ISCA.1998.694790
– volume: 12
  start-page: 23
  issue: 2
  year: 1992
  ident: 9373_CR26
  publication-title: IEEE Micro.
  doi: 10.1109/40.127581
– ident: 9373_CR10
  doi: 10.1109/IPDPS.2007.370291
– ident: 9373_CR9
  doi: 10.1109/SC.2008.5222004
– ident: 9373_CR16
– volume: 29
  start-page: 93
  issue: 2
  year: 2006
  ident: 9373_CR7
  publication-title: Journal of Supercomputing
  doi: 10.1007/s11227-007-0111-y
– ident: 9373_CR31
– volume: 51
  start-page: 129
  issue: 1
  year: 2008
  ident: 9373_CR15
  publication-title: SIAM Review
  doi: 10.1137/070693199
– ident: 9373_CR29
– ident: 9373_CR2
– ident: 9373_CR1
  doi: 10.1109/ICCD.2006.4380784
– ident: 9373_CR35
  doi: 10.1145/1531743.1531756
– ident: 9373_CR14
  doi: 10.1145/1345206.1345210
– volume: 25
  start-page: 41
  issue: 2
  year: 2005
  ident: 9373_CR30
  publication-title: IEEE Micro
  doi: 10.1109/MM.2005.37
– ident: 9373_CR27
– ident: 9373_CR4
  doi: 10.1145/1250662.1250668
– ident: 9373_CR18
  doi: 10.1007/978-3-540-85451-7_14
– ident: 9373_CR23
– ident: 9373_CR5
  doi: 10.1007/11823285_14
– volume: 18
  start-page: 1
  issue: 3b
  year: 1990
  ident: 9373_CR20
  publication-title: SIGARCH Comput. Archit. News
  doi: 10.1145/255129.255132
– ident: 9373_CR6
  doi: 10.1145/1273442.1250761
– ident: 9373_CR33
  doi: 10.1109/ISSCC.2007.373606
– ident: 9373_CR34
  doi: 10.1145/1048935.1050187
– ident: 9373_CR19
  doi: 10.1145/1345206.1345255
– ident: 9373_CR22
  doi: 10.1145/209936.209952
– ident: 9373_CR8
  doi: 10.1145/1178597.1178605
– ident: 9373_CR11
– ident: 9373_CR32
– volume: 18
  start-page: 1
  issue: 3b
  year: 1990
  ident: 9373_CR25
  publication-title: SIGARCH Comput. Archit. News
  doi: 10.1145/255129.255132
– ident: 9373_CR3
  doi: 10.1145/1360612.1360617
– ident: 9373_CR21
  doi: 10.1145/113446.113449
– ident: 9373_CR12
  doi: 10.1145/301618.301668
– ident: 9373_CR24
– ident: 9373_CR36
  doi: 10.1109/IPDPS.2007.370639
SSID ssj0037044
Score 1.8395927
Snippet The advent of multi-core/many-core chip technology offers both an extraordinary opportunity and a profound challenge. In particular, computer architects and...
SourceID proquest
crossref
springer
chongqing
SourceType Aggregation Database
Index Database
Publisher
StartPage 886
SubjectTerms Architects
Architecture
Artificial Intelligence
Chips
Compilers
Computation
Computer architecture
Computer programs
Computer Science
Computers
Data Structures and Information Theory
Design
Designers
Information Systems Applications (incl.Internet)
Optimization
Optimization techniques
R&D
Research & development
Short Paper
Software
Software Engineering
Synchronism
Synchronization
Theory of Computation
Tiling
优化技术
同步机制
芯片技术
计算机系统
软件架构
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSx0xFD5Y3UhBrQ-8PkoWXbUEZ5LMJLMQsaKVUkWsgruQZDKtKDM-rv_fc-ZO7q2Fdh1I4Ds5j-Q8PoBPMXPSyQoVKSrNlRYFNyKvuTe5KoRyZV1Sg_PZeXl6rb7fFDdzcJZ6YaisMtnE3lDXXaA_8j2jBb5EjMwPHh45kUZRcjUxaLiBWaHe7yeMvYMFtMgGr_3C1-Pzi8tkmqXOenZX-tPmxJaZ0px9Lx3GQlSjhStSS94PW_jdtb8e0YW8dVqzSPSv5Gnvk05WYGkIJtnhRPofYC62q_D-8I_cwCosJ94GNqjxGrAfk14W9pMC5tt7dtTVkXUt-9ZRAxa_Wofrk-Oro1M-MCXwgOHXmHshKS-r6tK4kIfGlAi68gWqr6hV0zSlpKaeEAtfe1-EKlYhC8EHmQXUcic3YL7t2rgJrMi8rzJX-axEUUllMu9EE3Wlc-FiU41gewoLetpwR_OjLM0NxNecGMHnBJR9mIzLsLPByISwRYQtIWxL3CpBaQfNebZTOY-ATVfxylMew7Wxe3m2-J6vtNTajOBLEsBsg38et_Xf47ZhcVIYQJW4OzA_fnqJuxhvjP3H4Ra9Ag1qzr0
  priority: 102
  providerName: ProQuest
Title Landing Stencil Code on Godson-T
URI https://link.springer.com/article/10.1007/s11390-010-9373-6
https://www.proquest.com/docview/872094831
https://www.proquest.com/docview/907973778
Volume 25
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT9swFH4acOEyGGNaKVQ-cAJZcmzHdo4FtUWwoQmoxE6W7TjbBEqAlv-f5zahAm2HnXJw9Cx98fN7L9_7AXAYmRNOFKhIUWoqNc-p4VlJvclkzqVTpUoFzt8v1dlUnt_mt20d96zLdu8oycVNvSp2Q2clJVExiiZVULUGGzmG7kkbp3zYXb9Cs8UE1_TfmqaJmB2V-TcRqaHC76b-9YjbvTVMK2_zHUG6sDvjbfjYOoxkuPzCn-BDrHdgqxvGQFrd_Azk27JAhVwnL_jPPTltykiamkyaVFVFb3ZhOh7dnJ7RdvwBDehTzannIpGtslTGhSxURiGS0ueok7yUVVUpkSp1Qsx96X0eilgEFoIPggVUXSe-wHrd1PErkJx5XzBXeKYQfyEN845XURc64y5WRQ_6rzig-Qx3qSmUTc0AMUTjPTjqkLEPyx4YdtXtOEFqEVKbILUKRXXY2VYdZtZojmGkEVkPyOsqnuNETrg6Ns8zi0F6oYXWpgfHHeIrAf_cbu-_3u7D5pL9T-m2-7A-f3qOB-hUzP0A1sx4MoCN4eTnxQifJ6PLH1eDxdF6AVgHxB0
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwEB6V7QGExKOAWMrDB7iALBLbiZNDhUpp2dLtCsFW6s3YjtMiUNKyWyF-HP-NmWy8C0hw69mSbX2e8Yw9880APA2JlVaWqEhBaa60yHgh0oq7IlWZUDavciI4H07y0ZF6d5wdr8HPyIWhtMp4J3YXddV6-iN_WWiBL5FCpq_Ozjk1jaLgauygYfvOCtVWV2Gs53UchB_f8QU329p_g8f9TIi93enOiPdNBrhHz2XOnZAU0lRVXlif-rrIcb_KZSj5olJ1XeeS-DA-ZK5yLvNlKH3ivfMy8aggVuK8V2BdEcF1AOuvdyfvP0RTIHXSdZOlP3RO3TljWLXj7qHvRTlhOCK15F1xh9O2OTlHk_WnkVx5vn8FazsbuHcLbvTOK9teSNttWAvNBlzf_i0WsQE3Y58I1l8bd4CNF9wZ9pEc9M9f2U5bBdY27G1LhC8-vQtHlwLaPRg0bRPuA8sS58rEli7JUTSkKhJnRR10qVNhQ10OYXMJC1p2_4XqVRmqU4ivRzGE5xEoc7Yoz2FWhZgJYYMIG0LY5DhVhNL0mjozS7kaAluOoopR3MQ2ob2YmTLB3UitiyG8iAewmuCfyz3473JP4Opoejg24_3JwSZcWyQlUBbwQxjMv12ER-jrzN3jXqIYfLpsIf4F0IANOw
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fb9MwED7BkBAv2xggyvjhhz0xWXVsx04ep0EZMCqktVLfLNtxNgRKOtr9_7trklWg8cBzrLP02ee7y913B3CUhFdelahISVuurcx5IbOKhyLTudTeVIYIzt-m5myuvyzyRT_ndDVUuw8pyY7TQF2amvV4WdXjLfENHRcqqBIczavi5iE8wtc4o4s-lyfDU6ys2ExzpX_YnKZjDmnN-0RQc4Wrtrm8xq3_NFJbz_OvZOnGBk32Ybd3HtlJd9pP4UFqDmBvGMzAej19Buy8I6uwC_KIf_xip22VWNuwTy0xrPjsOcwnH2enZ7wfhcAj-ldrHqSixKuuTOFjFuvCIKo65KifstJ1XRtFrJ2Y8lCFkMcylVHEGKISEdXYqxew07RNegksFyGUwpdBGDwLpQsRvKyTLW0mfarLERze4YCmNP6kBlGOGgNiuCZH8H5Axi27fhhu2_mYIHUIqSNInUFRA3auV42VK6zEkLJQ2QjY3Ve805So8E1qb1YOA_bSKmuLERwPiG8F_HO7V_-1-h08_v5h4s4_T78ewpOuKICqcF_Dzvr3TXqDvsY6vN3cp1uUvMc4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Landing+Stencil+Code+on+Godson-T&rft.jtitle=Journal+of+computer+science+and+technology&rft.au=Cui%2C+Hui-Min&rft.au=Wang%2C+Lei&rft.au=Fan%2C+Dong-Rui&rft.au=Feng%2C+Xiao-Bing&rft.date=2010-07-01&rft.issn=1000-9000&rft.eissn=1860-4749&rft.volume=25&rft.issue=4&rft.spage=886&rft.epage=894&rft_id=info:doi/10.1007%2Fs11390-010-9373-6&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s11390_010_9373_6
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F85226X%2F85226X.jpg