Hardware Transactional Memory Exploration in Coherence-Free Many-Core Architectures

High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster-based shared-memory architectures that provide a shared memory abstraction subject to non-uniform memory access costs. In order to keep the cores and memory hierarchy simple, many-core embedded syste...

Full description

Saved in:

Bibliographic Details
Published in	International journal of parallel programming Vol. 46; no. 6; pp. 1304 - 1328
Main Authors	Papagiannopoulou, Dimitra, Marongiu, Andrea, Moreshet, Tali, Benini, Luca, Herlihy, Maurice, Bahar, R. Iris
Format	Journal Article
Language	English
Published	New York Springer US 01.12.2018 Springer Nature B.V
Subjects	Clusters Coherence Computer memory Computer programming Computer Science Embedded systems Hardware Parallel processing Processor Architectures Servers Software Engineering/Programming and Operating Systems Synchronism Theory of Computation Coherence-free memory architectures Parallel processing Embedded systems Transactional memory
Online Access	Get full text

Cover

Loading…

More Information
Summary:	High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster-based shared-memory architectures that provide a shared memory abstraction subject to non-uniform memory access costs. In order to keep the cores and memory hierarchy simple, many-core embedded systems tend to employ simple, scratchpad-like memories, rather than hardware managed caches that require some form of cache coherence management. These “coherence-free” systems still require some means to synchronize memory accesses and guarantee memory consistency. Conventional lock-based approaches may be employed to accomplish the synchronization, but may lead to both usability and performance issues. Instead, speculative synchronization, such as hardware transactional memory, may be a more attractive approach. However, hardware speculative techniques traditionally rely on the underlying cache-coherence protocol to synchronize memory accesses among the cores. The lack of a cache-coherence protocol adds new challenges in the design of hardware speculative support. In this article, we present a new scheme for hardware transactional memory (HTM) support within a cluster-based, many-core embedded system that lacks an underlying cache-coherence protocol. We propose two alternative data versioning implementations for the HTM support, Full-Mirroring and Distributed Logging and we conduct a performance comparison between them. To the best of our knowledge, these are the first designs for speculative synchronization for this type of architecture. Through a set of benchmark experiments using our simulation platform, we show that our designs can achieve significant performance improvements over traditional lock-based schemes.
ISSN:	0885-7458 1573-7640
DOI:	10.1007/s10766-018-0569-7