Optimization of a novel programmable data-flow crypto processor using NSGA-II algorithm

[Display omitted] The optimization of a novel programmable data-flow crypto processor dedicated to security applications is considered. An architecture based on assigning basic functional units to four synchronous regions was proposed in a previous work. In this paper, the problem of selecting the n...

Full description

Saved in:

Bibliographic Details
Published in	Journal of advanced research Vol. 12; pp. 67 - 78
Main Authors	El-Hadidi, Mahmoud T., Elsayed, Hany M., Osama, Karim, Bakr, Mohamed, Aslan, Heba K.
Format	Journal Article
Language	English
Published	Egypt Elsevier B.V 01.07.2018 Elsevier
Subjects	Data-flow crypto processor Design space exploration FPGA implementation Multi-objective optimization NSGA-II Genetic Algorithm Programmable crypto processor FPGA implementation Programmable crypto processor Data-flow crypto processor Multi-objective optimization Design space exploration NSGA-II Genetic Algorithm
Online Access	Get full text

Cover

Loading…

More Information
Summary:	[Display omitted] The optimization of a novel programmable data-flow crypto processor dedicated to security applications is considered. An architecture based on assigning basic functional units to four synchronous regions was proposed in a previous work. In this paper, the problem of selecting the number of synchronous regions and the distribution of functional units among these regions is formulated as a combinatorial multi-objective optimization problem. The objective functions are chosen as: the implementation area, the execution delay, and the consumed energy when running the well-known AES algorithm. To solve this problem, a modified version of the Genetic Algorithm - known as NSGA-II - linked to a component database and a processor emulator, has been invoked. It is found that the performance improvement introduced by operating the processor regions at different clocks is offset by the necessary delay introduced by wrappers needed to communicate between the asynchronous regions. With a two clock-periods delay, the minimum processor delay of the asynchronous case is 311% of the delay obtained in the synchronous case, and the minimum consumed energy is 308% more in the asynchronous design when compared to its synchronous counterpart. This research also identifies the Instruction Region as the main design bottleneck. For the synchronous case, the Pareto front contains solutions with 4 regions that minimize delay and solutions with 7 regions that minimize area or energy. A minimum-delay design is selected for hardware implementation, and the FPGA version of the optimized processor is tested and correct operation is verified for AES and RC6 encryption/decryption algorithms.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2090-1232 2090-1224
DOI:	10.1016/j.jare.2017.11.002