Scalable embedded computing through reconfigurable hardware: Comparing DF-Threads, cilk, openmpi and jump

Data-Flow Threads (DF-Threads) is a new execution model that permits to seamlessly distribute the workload across several cores (in a multi-core) and several nodes (in a multi-node/multi-board configuration). In this paper, the advance in deploying this execution model is shown while developing it b...

Full description

Saved in:

Bibliographic Details
Published in	Microprocessors and microsystems Vol. 63; pp. 66 - 74
Main Author	Giorgi, Roberto
Format	Journal Article
Language	English
Published	Kidlington Elsevier B.V 01.11.2018 Elsevier BV
Subjects	Computer peripherals Computer programming Computer simulation Configurations Cyber-physical systems Distributed memory Distributed shared memory Embedded systems FPGA Programming Memory model Message passing Nodes Performance evaluation Programming model Reconfigurable hardware Reconfigurable systems System on chip Transceivers Workload Performance evaluation Reconfigurable systems Cyber-physical systems Programming model Memory model FPGA Programming Distributed shared memory
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Data-Flow Threads (DF-Threads) is a new execution model that permits to seamlessly distribute the workload across several cores (in a multi-core) and several nodes (in a multi-node/multi-board configuration). In this paper, the advance in deploying this execution model is shown while developing it by using a combination of a simulator model (i.e., the COTSon framework) and a reconfigurable hardware platform (i.e., the AXIOM-board). The AXIOM platform consists of a custom board based on the Xilinx Zynq Ultrascale+ ZU9EG, which incorporates the largest FPGA available on that System-on-Chip at the moment, four 64-bit ARM cores and two 32-bit ARM cores, up to 32GiB of main memory and several 16Gbit/s transceivers. While a complete DF-Threads system is still under development, but is already capable of running a full Linux OS and simple applications, so some initial results are presented here. In particular, well-known programming models that are used to exploit the Thread-Level Parallelism such as Cilk, OpenMPI and Jump are compared with DF-thread execution. Cilk is good for multi-cores, but it is not suitable for multi-nodes systems. In the latter cases, the distribution of the workload could be managed partly by the programmer when using programming models such as message-passing (OpenMPI has been chosen for reference) or distributed shared-memory (Jump in our case). The obtained results show that a DF-Thread execution on a cluster of eight 4-core boards can provide a speed-up of more than 14x compared to the same configuration when using OpenMPI and more than 80x when compared with a OpenMPI single core, single node execution.
ISSN:	0141-9331 1872-9436
DOI:	10.1016/j.micpro.2018.08.005